subprocess.CalledProcessError and RuntimeError: mat1 dim 1 must match mat2 dim 0 #75

Closed
opened 2026-01-29 21:40:58 +00:00 by claunia · 3 comments
Owner

Originally created by @asdf1996 on GitHub (Sep 23, 2021).

HI,
I got two errors when training one model with this command

BASICSR_JIT=True python -m torch.distributed.launch --nproc_per_node=4 --master_port=22022 gfpgan/train.py -opt options/train_gfpgan_v1_simple.yml --launcher pytorch

The errors are:

Traceback (most recent call last):
  File "gfpgan/train.py", line 11, in <module>
    train_pipeline(root_path)
  File "/mnt/bd/workshop/BasicSR/basicsr/train.py", line 169, in train_pipeline
    model.optimize_parameters(current_iter)
  File "/mnt/bd/workshop/GFPGAN/gfpgan/models/gfpgan_model.py", line 305, in optimize_parameters
    self.output, out_rgbs = self.net_g(self.lq, return_rgb=True)
  File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/bd/workshop/GFPGAN/gfpgan/archs/gfpganv1_arch.py", line 355, in forward
    style_code = self.final_linear(feat.view(feat.size(0), -1))
  File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/bd/workshop/BasicSR/basicsr/archs/stylegan2_arch.py", line 174, in forward
    out = F.linear(x, self.weight * self.scale, bias=bias)
  File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/nn/functional.py", line 1690, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0
Traceback (most recent call last):
  File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/home/tiger/miniconda3/envs/GFP2/bin/python', '-u', 'gfpgan/train.py', '--local_rank=3', '-opt', 'options/train_gfpgan_v1_simple.yml', '--launcher', 'pytorch']' returned non-zero exit status 1.

my environment:

_libgcc_mutex             0.1                        main  
_openmp_mutex             4.5                       1_gnu  
absl-py                   0.14.0                   pypi_0    pypi
addict                    2.4.0                    pypi_0    pypi
basicsr                   1.3.4.2                   dev_0    <develop>
blas                      1.0                         mkl  
ca-certificates           2021.7.5             h06a4308_1  
cachetools                4.2.2                    pypi_0    pypi
certifi                   2021.5.30        py38h06a4308_0  
charset-normalizer        2.0.6                    pypi_0    pypi
cudatoolkit               10.1.243             h6bb024c_0  
cycler                    0.10.0                   pypi_0    pypi
facexlib                  0.2.1.0                  pypi_0    pypi
filterpy                  1.4.5                    pypi_0    pypi
fonttools                 4.27.0                   pypi_0    pypi
freetype                  2.10.4               h5ab3b9f_0  
future                    0.18.2                   pypi_0    pypi
gfpgan                    0.2.1                     dev_0    <develop>
google-auth               2.1.0                    pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
grpcio                    1.41.0rc2                pypi_0    pypi
idna                      3.2                      pypi_0    pypi
imageio                   2.9.0                    pypi_0    pypi
intel-openmp              2021.3.0          h06a4308_3350  
jpeg                      9b                   h024ee3a_2  
kiwisolver                1.3.2                    pypi_0    pypi
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.35.1               h7274673_9  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.3.0               h5101ec6_17  
libgomp                   9.3.0               h5101ec6_17  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.3.0               hd4cf53a_17  
libtiff                   4.2.0                h85742a9_0  
libuv                     1.40.0               h7b6447c_0  
libwebp-base              1.2.0                h27cfd23_0  
llvmlite                  0.37.0                   pypi_0    pypi
lmdb                      1.2.1                    pypi_0    pypi
lz4-c                     1.9.3                h295c915_1  
markdown                  3.3.4                    pypi_0    pypi
matplotlib                3.5.0b1                  pypi_0    pypi
mkl                       2021.3.0           h06a4308_520  
mkl-service               2.4.0            py38h7f8727e_0  
mkl_fft                   1.3.0            py38h42c9631_2  
mkl_random                1.2.2            py38h51133e4_0  
ncurses                   6.2                  he6710b0_1  
networkx                  2.6.3                    pypi_0    pypi
ninja                     1.10.2               hff7bd54_1  
numba                     0.54.0                   pypi_0    pypi
numpy                     1.20.3           py38hf144106_0  
numpy-base                1.20.3           py38h74d4b33_0  
oauthlib                  3.1.1                    pypi_0    pypi
olefile                   0.46               pyhd3eb1b0_0  
opencv-python             4.5.3.56                 pypi_0    pypi
openjpeg                  2.4.0                h3ad879b_0  
openssl                   1.1.1l               h7f8727e_0  
packaging                 21.0                     pypi_0    pypi
pillow                    8.3.1            py38h2c7a002_0  
pip                       21.0.1           py38h06a4308_0  
protobuf                  4.0.0rc2                 pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pyparsing                 3.0.0rc1                 pypi_0    pypi
python                    3.8.11          h12debd9_0_cpython  
python-dateutil           2.8.2                    pypi_0    pypi
pytorch                   1.7.1           py3.8_cuda10.1.243_cudnn7.6.3_0    pytorch
pywavelets                1.1.1                    pypi_0    pypi
pyyaml                    5.4.1                    pypi_0    pypi
readline                  8.1                  h27cfd23_0  
requests                  2.26.0                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rsa                       4.7.2                    pypi_0    pypi
scikit-image              0.18.3                   pypi_0    pypi
scipy                     1.7.1                    pypi_0    pypi
setuptools                58.0.4           py38h06a4308_0  
setuptools-scm            6.3.2                    pypi_0    pypi
six                       1.16.0             pyhd3eb1b0_0  
sqlite                    3.36.0               hc218d9a_0  
tb-nightly                2.7.0a20210922           pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.0                    pypi_0    pypi
tifffile                  2021.8.30                pypi_0    pypi
tk                        8.6.10               hbc83047_0  
tomli                     1.2.1                    pypi_0    pypi
torchaudio                0.7.2                      py38    pytorch
torchvision               0.8.2                py38_cu101    pytorch
tqdm                      4.62.3                   pypi_0    pypi
typing_extensions         3.10.0.2           pyh06a4308_0  
tzdata                    2021a                h5d7bf9c_0  
urllib3                   1.26.6                   pypi_0    pypi
werkzeug                  2.0.1                    pypi_0    pypi
wheel                     0.37.0                   pypi_0    pypi
xz                        5.2.5                h7b6447c_0  
yapf                      0.31.0                   pypi_0    pypi
zlib                      1.2.11               h7b6447c_3  
zstd                      1.4.9                haebb681_0  

Could you help me to fix those errors? Thanks!

Originally created by @asdf1996 on GitHub (Sep 23, 2021). HI, I got two errors when training one model with this command ``` BASICSR_JIT=True python -m torch.distributed.launch --nproc_per_node=4 --master_port=22022 gfpgan/train.py -opt options/train_gfpgan_v1_simple.yml --launcher pytorch ``` The errors are: ``` Traceback (most recent call last): File "gfpgan/train.py", line 11, in <module> train_pipeline(root_path) File "/mnt/bd/workshop/BasicSR/basicsr/train.py", line 169, in train_pipeline model.optimize_parameters(current_iter) File "/mnt/bd/workshop/GFPGAN/gfpgan/models/gfpgan_model.py", line 305, in optimize_parameters self.output, out_rgbs = self.net_g(self.lq, return_rgb=True) File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/bd/workshop/GFPGAN/gfpgan/archs/gfpganv1_arch.py", line 355, in forward style_code = self.final_linear(feat.view(feat.size(0), -1)) File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/bd/workshop/BasicSR/basicsr/archs/stylegan2_arch.py", line 174, in forward out = F.linear(x, self.weight * self.scale, bias=bias) File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/nn/functional.py", line 1690, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: mat1 dim 1 must match mat2 dim 0 Traceback (most recent call last): File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in <module> main() File "/home/tiger/miniconda3/envs/GFP2/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/tiger/miniconda3/envs/GFP2/bin/python', '-u', 'gfpgan/train.py', '--local_rank=3', '-opt', 'options/train_gfpgan_v1_simple.yml', '--launcher', 'pytorch']' returned non-zero exit status 1. ``` my environment: ``` _libgcc_mutex 0.1 main _openmp_mutex 4.5 1_gnu absl-py 0.14.0 pypi_0 pypi addict 2.4.0 pypi_0 pypi basicsr 1.3.4.2 dev_0 <develop> blas 1.0 mkl ca-certificates 2021.7.5 h06a4308_1 cachetools 4.2.2 pypi_0 pypi certifi 2021.5.30 py38h06a4308_0 charset-normalizer 2.0.6 pypi_0 pypi cudatoolkit 10.1.243 h6bb024c_0 cycler 0.10.0 pypi_0 pypi facexlib 0.2.1.0 pypi_0 pypi filterpy 1.4.5 pypi_0 pypi fonttools 4.27.0 pypi_0 pypi freetype 2.10.4 h5ab3b9f_0 future 0.18.2 pypi_0 pypi gfpgan 0.2.1 dev_0 <develop> google-auth 2.1.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi grpcio 1.41.0rc2 pypi_0 pypi idna 3.2 pypi_0 pypi imageio 2.9.0 pypi_0 pypi intel-openmp 2021.3.0 h06a4308_3350 jpeg 9b h024ee3a_2 kiwisolver 1.3.2 pypi_0 pypi lcms2 2.12 h3be6417_0 ld_impl_linux-64 2.35.1 h7274673_9 libffi 3.3 he6710b0_2 libgcc-ng 9.3.0 h5101ec6_17 libgomp 9.3.0 h5101ec6_17 libpng 1.6.37 hbc83047_0 libstdcxx-ng 9.3.0 hd4cf53a_17 libtiff 4.2.0 h85742a9_0 libuv 1.40.0 h7b6447c_0 libwebp-base 1.2.0 h27cfd23_0 llvmlite 0.37.0 pypi_0 pypi lmdb 1.2.1 pypi_0 pypi lz4-c 1.9.3 h295c915_1 markdown 3.3.4 pypi_0 pypi matplotlib 3.5.0b1 pypi_0 pypi mkl 2021.3.0 h06a4308_520 mkl-service 2.4.0 py38h7f8727e_0 mkl_fft 1.3.0 py38h42c9631_2 mkl_random 1.2.2 py38h51133e4_0 ncurses 6.2 he6710b0_1 networkx 2.6.3 pypi_0 pypi ninja 1.10.2 hff7bd54_1 numba 0.54.0 pypi_0 pypi numpy 1.20.3 py38hf144106_0 numpy-base 1.20.3 py38h74d4b33_0 oauthlib 3.1.1 pypi_0 pypi olefile 0.46 pyhd3eb1b0_0 opencv-python 4.5.3.56 pypi_0 pypi openjpeg 2.4.0 h3ad879b_0 openssl 1.1.1l h7f8727e_0 packaging 21.0 pypi_0 pypi pillow 8.3.1 py38h2c7a002_0 pip 21.0.1 py38h06a4308_0 protobuf 4.0.0rc2 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 3.0.0rc1 pypi_0 pypi python 3.8.11 h12debd9_0_cpython python-dateutil 2.8.2 pypi_0 pypi pytorch 1.7.1 py3.8_cuda10.1.243_cudnn7.6.3_0 pytorch pywavelets 1.1.1 pypi_0 pypi pyyaml 5.4.1 pypi_0 pypi readline 8.1 h27cfd23_0 requests 2.26.0 pypi_0 pypi requests-oauthlib 1.3.0 pypi_0 pypi rsa 4.7.2 pypi_0 pypi scikit-image 0.18.3 pypi_0 pypi scipy 1.7.1 pypi_0 pypi setuptools 58.0.4 py38h06a4308_0 setuptools-scm 6.3.2 pypi_0 pypi six 1.16.0 pyhd3eb1b0_0 sqlite 3.36.0 hc218d9a_0 tb-nightly 2.7.0a20210922 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.0 pypi_0 pypi tifffile 2021.8.30 pypi_0 pypi tk 8.6.10 hbc83047_0 tomli 1.2.1 pypi_0 pypi torchaudio 0.7.2 py38 pytorch torchvision 0.8.2 py38_cu101 pytorch tqdm 4.62.3 pypi_0 pypi typing_extensions 3.10.0.2 pyh06a4308_0 tzdata 2021a h5d7bf9c_0 urllib3 1.26.6 pypi_0 pypi werkzeug 2.0.1 pypi_0 pypi wheel 0.37.0 pypi_0 pypi xz 5.2.5 h7b6447c_0 yapf 0.31.0 pypi_0 pypi zlib 1.2.11 h7b6447c_3 zstd 1.4.9 haebb681_0 ``` Could you help me to fix those errors? Thanks!
Author
Owner

@xinntao commented on GitHub (Sep 23, 2021):

It seems that the data dimension is not correct.
You may need to check:

  1. your data shape (should be 512x512)
  2. the network configuation
@xinntao commented on GitHub (Sep 23, 2021): It seems that the data dimension is not correct. You may need to check: 1. your data shape (should be 512x512) 2. the network configuation
Author
Owner

@asdf1996 commented on GitHub (Sep 23, 2021):

I check the data, some imgs are not the 512*512 shape. Thanks.
Besides, I'd like to know how to augment the dataset.
You have mentioned

More high quality faces can improve the restoration quality.

If some collected internet imgs are not so high quality like FFHQ, what pre-process can be applied to those training imgs?

Thanks for your reply!

@asdf1996 commented on GitHub (Sep 23, 2021): I check the data, some imgs are not the 512*512 shape. Thanks. Besides, I'd like to know how to augment the dataset. You have mentioned > More high quality faces can improve the restoration quality. If some collected internet imgs are not so high quality like FFHQ, what pre-process can be applied to those training imgs? Thanks for your reply!
Author
Owner

@BruceZhou95 commented on GitHub (Mar 29, 2023):

I check the data, some imgs are not the 512*512 shape. Thanks. Besides, I'd like to know how to augment the dataset. You have mentioned

More high quality faces can improve the restoration quality.

If some collected internet imgs are not so high quality like FFHQ, what pre-process can be applied to those training imgs?

Thanks for your reply!
Hello hello I have the same problem with you, How to change the input Images size with high quality images .Do you get the answer?
Hope your reply! Thx

@BruceZhou95 commented on GitHub (Mar 29, 2023): > I check the data, some imgs are not the 512*512 shape. Thanks. Besides, I'd like to know how to augment the dataset. You have mentioned > > > More high quality faces can improve the restoration quality. > > If some collected internet imgs are not so high quality like FFHQ, what pre-process can be applied to those training imgs? > > Thanks for your reply! Hello hello I have the same problem with you, How to change the input Images size with high quality images .Do you get the answer? Hope your reply! Thx
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: TencentARC/GFPGAN#75