docker配置TensorFlow训练环境

神经网络

Posted by Ccloud on 2022-08-23
Estimated Reading Time 2 Minutes
Words 408 In Total
Viewed Times

本文详细介绍了如何直接拉取、配置适用于深度神经网络训练的环境

docker环境配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#给docker安装nvidia-container-runtime,让container能使用宿主机的nvidia显卡
教程网址:http://www.manongjc.com/detail/24-qcliirtikyklgea.html
配置nvidia仓库:
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
安装运行时

sudo apt-get install nvidia-container-runtime
停止docker

systemctl stop docker
把运行时添加到docker中:
dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime

# repalce [your name] by your name
docker run --gpus=all -dit -v /home/henry/asc:/home/asc --name [your name] tensorflow/tensorflow:2.6.1-gpu /bin/bash
docker exec -it [container id] /bin/bash
# 配置环境
apt update
apt upgrade
apt install git cmake vim
cd /your_deepmd-kit_path
# 切换分支
git clone --recursive https://github.com/deepmodeling/deepmd-kit.git deepmd-kit
git checkout -b asc22 remotes/origin/asc-2022
export DP_VARIANT="cuda"
pip install .
#可以采用开发者模式安装
export DP_VARIANT="cuda"
python setup.py develop
or pip install -v -e .
# 配置训练环境
dpkg -i nccl-local-repo-ubuntu2004-2.8.4-cuda11.2_1.0-1_amd64.deb
apt install libnccl2=2.8.4-1+cuda11.2 libnccl-dev=2.8.4-1+cuda11.2
apt install libopenmpi-dev
HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_TENSORFLOW=1 HOROVOD_GPU_OPERATIONS=NCCL pip install horovod mpi4py
# 训练
cd /your_data_path
CUDA_VISIBLE_DEVICES=0 horovodrun -np 1 dp train --mpi-log=workers input.json

代理

pip

1
2
3
4
5
6
#  ~/.pip/pip.conf
[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
proxy = http://58.199.160.174:3128
[install]
trusted-host=pypi.tuna.tsinghua.edu.cn

git

1
2
#~/.gitconfig
Acquire::http::Proxy "http://58.199.160.174:3128";

apt

1
2
# /etc/apt/apt.conf
Acquire::http::Proxy "http://58.199.160.174:3128";

wget

1
2
3
4
5
6
7
8
#~/.wgetrc
https_proxy = http://10.0.65.18:8888/
http_proxy = http://10.0.65.18:8888/
ftp_proxy = http://10.0.65.18:8888/

# If you do not want to use proxy at all, set this to off.
use_proxy = on


如果您喜欢此博客或发现它对您有用,则欢迎对此发表评论。 也欢迎您共享此博客,以便更多人可以参与。 如果博客中使用的图像侵犯了您的版权,请与作者联系以将其删除。 谢谢 !