Complile error when building extension 'deform_conv' #18

Closed
opened 2026-01-29 21:37:05 +00:00 by claunia · 11 comments
Owner

Originally created by @syfbme on GitHub (Jul 7, 2021).

I have followed the "Installation" but encountering below error. Details are attached below. Any help will be grateful.
When running cmd: CUDA_HOME=/usr/local/cuda-10.2 BASICSR_EXT=True pip install basicsr
below error shows:
ERROR: Command errored out with exit status 1: command: /data/anaconda3/envs/pytorch18/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-uoof9gcv cwd: /tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/ Complete output (260 lines): Traceback (most recent call last): File "/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/basicsr/ops/dcn/deform_conv.py", line 10, in <module> from . import deform_conv_ext ImportError: cannot import name 'deform_conv_ext' from partially initialized module 'basicsr.ops.dcn' (most likely due to a circular import) (/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/basicsr/ops/dcn/__init__.py)

The error shows we can't import deform_conv_ext. I think it says that module deform_conv_ext has not been built successfully. But i don't know how to check it when using pip install. So i try to install basicsr by git clone and compile following this link. But when i ran command:
CUDA_HOME=/usr/local/cuda-10.2 BASICSR_EXT=True python setup.py develop
It failed when building 'basicsr.ops.dcn.deform_conv_ext' extension
image

I have googled it and found no useful information. Here is my enviroment:
gcc: 7.5.0
cuda: 10.2
image
torch.config.show()
'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n

Originally created by @syfbme on GitHub (Jul 7, 2021). I have followed the "Installation" but encountering below error. Details are attached below. Any help will be grateful. When running cmd: CUDA_HOME=/usr/local/cuda-10.2 BASICSR_EXT=True pip install basicsr below error shows: ` ERROR: Command errored out with exit status 1: command: /data/anaconda3/envs/pytorch18/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/setup.py'"'"'; __file__='"'"'/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-uoof9gcv cwd: /tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/ Complete output (260 lines): Traceback (most recent call last): File "/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/basicsr/ops/dcn/deform_conv.py", line 10, in <module> from . import deform_conv_ext ImportError: cannot import name 'deform_conv_ext' from partially initialized module 'basicsr.ops.dcn' (most likely due to a circular import) (/tmp/pip-install-ekcq6eum/basicsr_4146e4815a4548d195157ceecc274ac0/basicsr/ops/dcn/__init__.py)` The error shows we can't import deform_conv_ext. I think it says that module deform_conv_ext has not been built successfully. But i don't know how to check it when using pip install. So i try to install basicsr by git clone and compile following [this link](https://github.com/xinntao/BasicSR). But when i ran command: CUDA_HOME=/usr/local/cuda-10.2 BASICSR_EXT=True python setup.py develop It failed when building 'basicsr.ops.dcn.deform_conv_ext' extension ![image](https://user-images.githubusercontent.com/13032160/124730952-60383a80-df44-11eb-9888-15b122e5a4ea.png) I have googled it and found no useful information. Here is my enviroment: gcc: 7.5.0 cuda: 10.2 ![image](https://user-images.githubusercontent.com/13032160/124731512-e05ea000-df44-11eb-9e3e-cd455def47bf.png) torch.__config__.show() 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n
Author
Owner

@xinntao commented on GitHub (Jul 7, 2021):

I do not the exact reason.

  1. As we do not need to use dcn. You may remove the dcn compile part when run CUDA_HOME=/usr/local/cuda-10.2 BASICSR_EXT=True python setup.py develop.

Comment the following part:
image

  1. You can also use BASICSR_JIT=True.
  1. uninstall all the basicsr
  2. just run pip install basicsr
  3. Test with env BASICSR_JIT=True, which will compile the necessary package just in time.
@xinntao commented on GitHub (Jul 7, 2021): I do not the exact reason. 1. As we do not need to use dcn. You may remove the dcn compile part when run ` CUDA_HOME=/usr/local/cuda-10.2 BASICSR_EXT=True python setup.py develop`. Comment the following part: ![image](https://user-images.githubusercontent.com/17445847/124734247-84e1e180-df47-11eb-8681-63e9d6d6281a.png) 2. You can also use BASICSR_JIT=True. 1) uninstall all the basicsr 2) just run `pip install basicsr` 3) Test with env BASICSR_JIT=True, which will compile the necessary package just in time.
Author
Owner

@syfbme commented on GitHub (Jul 8, 2021):

Hi @xinntao
Thanks for your quick reply. I have tried both 2 ways you suggested.
For the 1st way, there is still errors complaining building 'fused_act_ext' error...
For the 2nd ways, the installation is okay since it doesn't build extensions. However, when testing with BASICSR_JIT=True, it shows "No module named deform_env" which i think was caused by the just in time build failed...
I don't understand that you mentioned "As we do not need to use dcn". But the script result shows we need to import deform_con_ext which is from basicsr.ops.dcn...
image

@syfbme commented on GitHub (Jul 8, 2021): Hi @xinntao Thanks for your quick reply. I have tried both 2 ways you suggested. For the 1st way, there is still errors complaining building 'fused_act_ext' error... For the 2nd ways, the installation is okay since it doesn't build extensions. However, when testing with BASICSR_JIT=True, it shows "No module named deform_env" which i think was caused by the just in time build failed... I don't understand that you mentioned "As we do not need to use dcn". But the script result shows we need to import deform_con_ext which is from basicsr.ops.dcn... ![image](https://user-images.githubusercontent.com/13032160/124847332-38d98000-dfcd-11eb-97e3-6675a16b4ce9.png)
Author
Owner

@tachikoma777 commented on GitHub (Jul 8, 2021):

BASICSR_JIT=True python inference_gfpgan_full.py --model_path experiments/pretrained_models/GFPGANv1.pth --test_path inputs/whole_imgs
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/basicsr/ops/dcn/deform_conv.py", line 10, in
from . import deform_conv_ext
ImportError: cannot import name 'deform_conv_ext' from 'basicsr.ops.dcn' (/opt/conda/lib/python3.7/site-packages/basicsr/ops/dcn/init.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1673, in _run_ninja_build
env=env)
File "/opt/conda/lib/python3.7/subprocess.py", line 487, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

I got similar error...

@tachikoma777 commented on GitHub (Jul 8, 2021): BASICSR_JIT=True python inference_gfpgan_full.py --model_path experiments/pretrained_models/GFPGANv1.pth --test_path inputs/whole_imgs Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/basicsr/ops/dcn/deform_conv.py", line 10, in <module> from . import deform_conv_ext **ImportError: cannot import name 'deform_conv_ext' from 'basicsr.ops.dcn'** (/opt/conda/lib/python3.7/site-packages/basicsr/ops/dcn/__init__.py) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1673, in _run_ninja_build env=env) File "/opt/conda/lib/python3.7/subprocess.py", line 487, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. I got similar error...
Author
Owner

@syfbme commented on GitHub (Jul 8, 2021):

Hi @tachikoma777
What is your pytorch built gcc version and current gcc version.
you can use this cammand 'torch.config.show()' in python to see pytorch built gcc version.

@syfbme commented on GitHub (Jul 8, 2021): Hi @tachikoma777 What is your pytorch built gcc version and current gcc version. you can use this cammand 'torch.config.show()' in python to see pytorch built gcc version.
Author
Owner

@tachikoma777 commented on GitHub (Jul 8, 2021):

@syfbme

torch.config.show()
'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 10.2\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70\n - CuDNN 7.6.5\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON,USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n'

Did you fix this?

@tachikoma777 commented on GitHub (Jul 8, 2021): @syfbme torch.__config__.show() 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 10.2\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70\n - CuDNN 7.6.5\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON,USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n' Did you fix this?
Author
Owner

@syfbme commented on GitHub (Jul 8, 2021):

Hi @tachikoma777
What is your current gcc version. You can see it by command "gcc -v"
No. I haven't fixed it...I am trying...

@syfbme commented on GitHub (Jul 8, 2021): Hi @tachikoma777 What is your current gcc version. You can see it by command "gcc -v" No. I haven't fixed it...I am trying...
Author
Owner

@xinntao commented on GitHub (Jul 8, 2021):

Hi @xinntao
Thanks for your quick reply. I have tried both 2 ways you suggested.
For the 1st way, there is still errors complaining building 'fused_act_ext' error...
For the 2nd ways, the installation is okay since it doesn't build extensions. However, when testing with BASICSR_JIT=True, it shows "No module named deform_env" which i think was caused by the just in time build failed...
I don't understand that you mentioned "As we do not need to use dcn". But the script result shows we need to import deform_con_ext which is from basicsr.ops.dcn...
image

  1. For I don't understand that you mentioned ...: BasicSR will import dcn. But I think if the dcn complication has error, so does fused_act_ext.
  2. if you set BASICSR_JIT=True, then the screen will print a lot of information, you may attach it?
@xinntao commented on GitHub (Jul 8, 2021): > Hi @xinntao > Thanks for your quick reply. I have tried both 2 ways you suggested. > For the 1st way, there is still errors complaining building 'fused_act_ext' error... > For the 2nd ways, the installation is okay since it doesn't build extensions. However, when testing with BASICSR_JIT=True, it shows "No module named deform_env" which i think was caused by the just in time build failed... > I don't understand that you mentioned "As we do not need to use dcn". But the script result shows we need to import deform_con_ext which is from basicsr.ops.dcn... > ![image](https://user-images.githubusercontent.com/13032160/124847332-38d98000-dfcd-11eb-97e3-6675a16b4ce9.png) 1. For ` I don't understand that you mentioned ...`: BasicSR will import dcn. But I think if the dcn complication has error, so does fused_act_ext. 2. if you set `BASICSR_JIT=True`, then the screen will print a lot of information, you may attach it?
Author
Owner

@xinntao commented on GitHub (Jul 8, 2021):

@tachikoma777
Could you show the output of ls /opt/conda/lib/python3.7/site-packages/basicsr/ops/dcn/

@xinntao commented on GitHub (Jul 8, 2021): @tachikoma777 Could you show the output of `ls /opt/conda/lib/python3.7/site-packages/basicsr/ops/dcn/`
Author
Owner

@xinntao commented on GitHub (Jul 12, 2021):

@syfbme Hi,
have you solved this issue?

@xinntao commented on GitHub (Jul 12, 2021): @syfbme Hi, have you solved this issue?
Author
Owner

@syfbme commented on GitHub (Jul 12, 2021):

@syfbme Hi,
have you solved this issue?

Yes. By reinstall the os... I lost the original environment so i can't reproduce the issue. Below are my guess:
I updated gcc to version 9.3 for some reason. Then i found the pytorch gcc version is 7.3.0. I installed gcc through apt but it can only get v7.5.0 and there is still gcc compatibility issue. So i compile and install gcc 7.3.0 and the issue still exists. Now there should not be gcc version issue... I don't know why and maybe the gcc is messed up. So i reinstall the whole operation system and this time there is no issue.
Sorry not being able to help...

@syfbme commented on GitHub (Jul 12, 2021): > @syfbme Hi, > have you solved this issue? Yes. By reinstall the os... I lost the original environment so i can't reproduce the issue. Below are my guess: I updated gcc to version 9.3 for some reason. Then i found the pytorch gcc version is 7.3.0. I installed gcc through apt but it can only get v7.5.0 and there is still gcc compatibility issue. So i compile and install gcc 7.3.0 and the issue still exists. Now there should not be gcc version issue... I don't know why and maybe the gcc is messed up. So i reinstall the whole operation system and this time there is no issue. Sorry not being able to help...
Author
Owner

@syfbme commented on GitHub (Jul 12, 2021):

Close it since i can't reproduce the issue

@syfbme commented on GitHub (Jul 12, 2021): Close it since i can't reproduce the issue
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: TencentARC/GFPGAN#18