Cuda

alexeyab
https://developer.nvidia.com/cuda-toolkit-archive Each version of cuda has a specific kernel it works with. Click on the installation guide for linux section: https://docs.nvidia.com/cuda/archive/11.0/cuda-installation-guide-linux/index.html archive  https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html  Latest  install guide Install the cuda*.run file after logging into root console from the grub menu. Cuda 11.0 uses Kernel 5.4.0 for example. Instructoins on using multiple cuda and cudnn versions: medium.com multiple versions of cuda and multi cuda, cudnn versoins as per amazon siam install instructoins.
 * nvidiadevguid.png

Modify the bashrc file after installing Cuda as per AlexeyAB github yolo install documention. Run the CUDA install script with the --silent --toolkit --override options. Set the LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64. Change the /usr/local/cuda symbolic link to point back to the default version.
 * https://towardsdatascience.com/managing-cuda-dependencies-with-conda-89c5d817e7e1 ''rather export env. variables to cuda directories as per Object tracking
 * https://towardsdatascience.com/building-a-conda-environment-for-horovod-773bd036bf64
 * https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html

opencv compiled with cuda
https://pastebin.com/JhpnAmxn compiles with cuda, cudnn opencv 4.5

https://www.pyimagesearch.com/2020/02/03/how-to-use-opencvs-dnn-module-with-nvidia-gpus-cuda-and-cudnn/

https://medium.com/@patrickorcl/compile-with-nvcc-3566fbdfdbf ARCH flags for nvcc. GEforce series is 6.1

gist
https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07 333 stars

https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html

cudnn install

https://linuxconfig.org/how-to-install-the-latest-nvidia-drivers-on-debian-9-stretch-linux nvidia xconfig

kernel selection
After installing 4.4.115, select it from the advanced tab of grub on startup. sudo uname -r #this will list all your kernels sudo apt purge linux-image-4.13.0-26-generic # only 4.4.115 remains sudo apt purge linux-headers-4.13.0-26-generic sudo apt list linux-image* sudo apt list linux-headers* sudo update-initramsfs -u reboot
 * On Ubuntu 16.04 LTE select kernel 4.4.115(drop to 4.4.0 if this doesn't work) with Ukuu kernel selection tool.  Kernel 4.13 works only on some motherboards, it isn't stable enough.  See  devtalk nvidia cuda 9 fail ubuntu 16.04 with kernel 4.4.13. See linux kernel
 * Nvidia fan control Set the fixed fan speed of all the cards.
 * http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html lists the linux kernel version to use for the relevant distros.
 * https://developer.nvidia.com/cuda-toolkit-archive select version 8, 9 or 9.1

wget -nc https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run #for ubuntu 16.04 gtx1070 etc series.

Press Ctrl+Alt+F4 at log-in screen (type in user-name and password into command line)

sudo nano /etc/modprobe.d/blacklist.conf blacklist amd76x_edac blacklist vga16fb blacklist nouveau blacklist rivafb blacklist nvidiafb blacklist rivatv sudo apt-get remove --purge nvidia* sudo service lightdm stop cd ~/Downloads chmod +x cuda_8.0.61_375.26_linux.run sudo ./cuda_8.0.61_375.26_linux.run  #follow prompts

See LinuxNotes for tutorial on adding LD_LIBRARY_PATH. After installing run these commands:

setpath
https://www.linuxslaves.com/2016/05/3-ways-fix-ubuntu-gets-stuck-login-loop.html

https://github.com/markjay4k/Install-Tensorflow-on-Ubuntu-17.10-/blob/master/Tensorflow%20Install%20instructions.ipynb from Yolo Mark Jay.

wiht ~/.bashrc append

export PATH=/usr/local/cuda-9.2/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

https://askubuntu.com/questions/210884/setting-ld-library-path-for-cuda

sudo echo "/usr/local/cuda-9.2/lib64" >> /etc/ld.so.conf.d/nvidia.conf and sudo ldconfig

Now run sudo service lightdm restart. Places you back into the gui.

Booting to blank screen and don’t even get to the log in page? Try this…  Press “e” at grub menu to edit the boot options. (adding “nomodeset” to your grub boot options right after “quiet splash”near the bottom line will help you). Sometimes after updating your OS you will need to repeat this process minus the whole blacklisting the drivers part.

Linux mint users need to run the command “sudo service mdm stop” in place of “sudo service lightdm stop”. With some mboards, it won't even boot up with gtx1070's installed. It either endlessly reboots, boots to bios or black screen or insists that no keyboard is installed. Remove gtxs and use onboard display driver, then blacklist under modprobe.d. Then poweroff and reinsert gtx1070s.

cuda 9
Multiple versions of cuda is possible. Use the which command: which nvcc. Go to that directory (cuda8) and symbolic link all the binaries in /usr/local/cuda-8.0/bin/  (green colored) to the usr/local/cuda-9.0/bin directory. This assumes that cuda-8.0 was placed in the path.


 * 1) https://archive.archlinux.org/packages/c/cuda/ for tar file
 * 2) https://wiki.archlinux.org/index.php/GPGPU#Development_2
 * 3) https://wiki.archlinux.org/index.php/NVIDIA
 * 4) https://docs.kali.org/general-use/install-nvidia-drivers-on-kali-linux

cudnn
http://shitalshah.com/p/debugging-tensorflow-dll-importerror/ from https://github.com/tensorflow/tensorflow/issues/14946

errors
https://stackoverflow.com/questions/46584000/cmake-error-variables-are-set-to-notfound variables set to not found. This error is with opencv 3.3.0 try 3.4.0

dkms status
https://codeyarns.com/2013/02/07/how-to-fix-nvidia-driver-failure-on-ubuntu/ You may still have some NVIDIA modules stuck in the kernel. First list the kernel modules: dkms status

sudo dkms remove nvidia-current-updates/304.64 -k 3.2.0-37-generic

nouveau
https://askubuntu.com/questions/957821/how-to-completely-disable-nouveau-as-other-tutorials-arent-helping blacklist nouveau

https://askubuntu.com/questions/481414/install-nvidia-driver-instead-of-nouveau If you still get the error related to nouveau drivers then you are probably required to update the initramfs, which might be configured to load the nouveau drivers. Don't reboot or poweroff, run this command to update the initramfs disk.

sudo update-initramfs -u then reboot and repeat these steps:
 * Ctrl+alt f4
 * sudo service lightdm stop  #sudo /etc/init.d/gdm stop on older systems
 * sudo ./cuda_8.0.61_375.26_linux.run    #run the driver package again.

apt install
http://www.pradeepadiga.me/blog/2017/03/22/installing-cuda-toolkit-8-0-on-ubuntu-16-04/ Will only work with kernel 4.4.115 or 4.4 on Ubuntu 16.04 LTS, use Ukuu kernel selection tool to obtain kernel.

Nvidia download page
http://download.nvidia.com/XFree86/
 * http://download.nvidia.com/XFree86/Linux-x86_64/384.111/NVIDIA-Linux-x86_64-384.111.run
 * https://developer.nvidia.com/cuda-80-ga2-download-archive CUDA 8 for ubuntu 16.04

intel bios
http://ethosdistro.com/kb/#booting-intel-boards

cuda
https://asierarranz.github.io/Razer-Blade-1060GTX-CUDA-cuDNN-Ubuntu/

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.44-1_amd64.deb

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb

sudo apt-get update # (here you could get an "Invalid Date" warning, you can ignore it)

sudo apt-get install cuda # (Wait, around 2Gb will be downloaded)

aptitude
For GTX1070,1050, 1060, 1080ti and Titan X
 * sudo apt install -y aptitude
 * aptitude search cuda   #lists cuda drivers.
 * aptitude search nvidia #lists nvidia drivers.

ERRS
missing library ligglu.so  libX11.so  libXi   libXmu

links

 * nccl, GrubAndLilo
 * Nvidia linux drivers
 * Berkeleydb
 * GPU coolent for gpu

https://www.linkedin.com/pulse/installing-nvidia-cuda-80-ubuntu-1604-linux-gpu-new-victor/