mirror of
https://github.com/TencentARC/GFPGAN.git
synced 2026-02-15 05:44:38 +00:00
训练日志异常 #20
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @SimKarras on GitHub (Jul 6, 2021).
为什么当我将4gpu调整为2gpu,训练时就不输出结果?无论是实时终端,亦或是experiments内项目文件下本应出现的log文件,都无。改动如下:
gpu看情况是进入训练状态的。
ps:之前复现您的BasicSR中的esrgan也是类似情况
@xinntao commented on GitHub (Jul 6, 2021):
这个问题我没有遇到过, 需要更多信息
@SimKarras commented on GitHub (Jul 6, 2021):
@xinntao 感谢您的回复,我暂时还未能上四卡。
pytorch = '1.8.0+cu111'
之前同样的环境单卡train esrgan没有类似问题。
我发现出问题的好像只是log出不来,训练照常进行,模型也保存了,wandb也一切正常。
如果您之前双卡没出现类似情况,那应该是环境不匹配造成的吧
@SimKarras commented on GitHub (Jul 6, 2021):
补充一下,esrgan原本就是单卡训练,所以没有问题。但basicsr内有其他项目是四卡的,改双卡就出现类似问题。
@xinntao commented on GitHub (Jul 7, 2021):
这个问题确实很奇怪, 我在 pytorch 1.8 cuda10.2下 没有遇到这个问题。
那你的程序, 它有保存 .log的文件吗?
@SimKarras commented on GitHub (Jul 7, 2021):
@xinntao 我刚又检查了,确实没有.log文件,终端也是没输出

其他一切正常。
@SimKarras commented on GitHub (Jul 8, 2021):
@xinntao 您好,我有一些关于网络改进的想法,想跟您讨论一下是否可行。您能否给我一个联系方式。我的邮箱:jiaweishi.cv@qq.com
@SimKarras commented on GitHub (Jul 8, 2021):
八卡情况下,log正常。
@syfbme commented on GitHub (Jul 12, 2021):
same issue...
@xinntao commented on GitHub (Jul 12, 2021):
@syfbme @JiaweiShiCV
I cannot reproduce this issue. Could you guys help me to debug it?
It may be caused by the logging mechanism in BasicSR.
In the basicsr folder: basicsr/utils/logger.py Line106 -Line40
Could you please add these lines and post the outputs here? Thanks
@syfbme commented on GitHub (Jul 12, 2021):
Hi @xinntao
Only output "Enter get_root_logger"
@xinntao commented on GitHub (Jul 12, 2021):
@syfbme Thanks
It is strange...
Could you please modify this function to the follows, and post the outputs?
@syfbme commented on GitHub (Jul 12, 2021):
Hi @xinntao

i only used 1 gpu to make display cleaner. And below is the output:
Only the first time enter has "add handlers" and "last return"
@xinntao commented on GitHub (Jul 12, 2021):
@syfbme If it prints "add handlers" and "last return", then the issue has been solved.
So, you could see the screen outputs, and also have a log file in the experiments file, right?
@xinntao commented on GitHub (Jul 12, 2021):
@JiaweiShiCV
确实没有.log文件,终端也是没输出这个问题,你现在还遇到么@SimKarras commented on GitHub (Jul 12, 2021):
@xinntao 我目前八卡以及四卡都没问题,双卡的话应该还是没输出
@xinntao commented on GitHub (Jul 12, 2021):
@JiaweiShiCV
能帮忙在两卡上(即不能输出log 的case) 测试下面的解决方案吗? (我这边没法复现,所以没法debug)
在 BasicSR folder: basicsr/utils/logger.py Line106 -Line40
修改为:
谢谢!
@SimKarras commented on GitHub (Jul 12, 2021):
@xinntao 好的
@syfbme commented on GitHub (Jul 12, 2021):
Yes. Thanks~
@SimKarras commented on GitHub (Jul 12, 2021):
@xinntao 双卡终端输出:
log文件内容:
@xinntao commented on GitHub (Jul 12, 2021):
@syfbme Thanks for your feedback!
@xinntao commented on GitHub (Jul 12, 2021):
@JiaweiShiCV
It seems that this issue could be solved by the above modification!
@SimKarras commented on GitHub (Jul 12, 2021):
@xinntao 。。。 输出不是还是只有这么点吗
@xinntao commented on GitHub (Jul 12, 2021):
@JiaweiShiCV 它没有接着输出了么...
@SimKarras commented on GitHub (Jul 12, 2021):
没......
@xinntao commented on GitHub (Jul 12, 2021):
@JiaweiShiCV
This bug has been fixed in BasicSR:
bf93f27e88It should be OK now!
@SimKarras commented on GitHub (Jul 12, 2021):
@xinntao 重新装basicsr=1.3.3.5就ok了是吗
@xinntao commented on GitHub (Jul 12, 2021):
这个目前是改在master分支上, 还没有新的版本, 我现在发一个新版 1.3.3.6
@SimKarras commented on GitHub (Jul 12, 2021):
ok!