mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
Issues with ISDB (Brazil) support #118
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @cfsmp3 on GitHub (Feb 20, 2016).
Originally assigned to: @Abhinav95 on GitHub.
All seem easy to solve (so possibly a good chance for GSOC applicants to get points).
On the Brazilian files, the text is great (ignore the failure to show utf8 properly), but the timestamps don't increment -- and the second timestamp doesn't respond to the -delay switch:
ccextractor -datets -pn 48352 -noru -out=ttxt -unixts 0 -delay 1454292360000 -o 2016-02-01_0206_BR_Globo_Test_02.ccx2.out /tv/2016/2016-02/2016-02-01/2016-02-01_0206_BR_Globo_Test_02.mpg
csa@esfinge:/tv/2016/2016-02/2016-02-01$ cat 2016-02-01_0206_BR_Globo_Test_02.ccx2.out
20160201020600,000|19700101000000,002|ISDB|PARTICIPOU DAS PRINCIPAIS
20160201020600,000|19700101000000,002|ISDB|JOGADAS OFENSIVAS DO... OUTRO
20160201020600,000|19700101000000,002|ISDB|TOQUE, E DEPOIS ELE ENTRA PELA
20160201020600,000|19700101000000,002|ISDB|JOGADAS OFENSIVAS DO... OUTRO
20160201020600,000|19700101000000,002|ISDB|TOQUE, E DEPOIS ELE ENTRA PELA
20160201020600,000|19700101000000,002|ISDB|ESQUERDA, OLHA L� O ESPA�O QUE
20160201020600,000|19700101000000,002|ISDB|TINHA O JORGE HENRIQUE PARA
20160201020600,000|19700101000000,002|ISDB|TENTAR DAR UM TOQUE. E NEN�
20160201020600,000|19700101000000,002|ISDB|AINDA FAZ O GESTO, P�, PODIA TER
20160201020600,000|19700101000000,002|ISDB|TOCADO PARA MIM.
20160201020600,000|19700101000000,002|ISDB|>> [LUIS ROBERTO] LANCES DO
20160201020600,000|19700101000000,002|ISDB|PRIMEIRO TEMPO EM S�O JANEIRO
20160201020600,000|19700101000000,002|ISDB|PASSO AIR, A COLINA HIST�RICA,
If I include the -UCLA switch, I do get the offset on the second timestamp:
ccextractor -datets -pn 48352 -UCLA -noru -ttxt -unixts 0 -delay 1454292360000 -o $FIL.ccx3.out /tv/2016/2016-02/2016-02-01/$FIL.mpg
but no incremental time, so this appears to be the only actual problem.
I also get 'NA' in the third column. We no longer use the teletext page number that is added to the third column by the -UCLA switch so it should be removed.
I've placed the file on http://vrnewsscape.ucla.edu/dropbox/2016-02-01_BR_Globo_Test_01.mpg
@YorkHe commented on GitHub (Mar 13, 2016):
@cfsmp3 This file will crash in Windows when using -datets... I'm debugging on it.
@YorkHe commented on GitHub (Mar 13, 2016):
@cfsmp3 According to my tracing on the bug, the function
get_cinfoints_info.cfailed to return a validtimestampincinfo->codec_private_data(The invalid return value is 14829735431805717965) , and which goes intosub->data->endtime,.Finally, that caused an invalid
strftimeinutility.c.In windows, it raised an assertion error.
@anshul1912 commented on GitHub (Mar 15, 2016):
Do you propose any solution for this problem.
On Sun, Mar 13, 2016 at 4:02 PM, He Yu notifications@github.com wrote:
with regards
Anshul
:)
@Abhinav95 commented on GitHub (Mar 17, 2016):
Pull request #334
The command which I am running is:-
ccextractor -datets -pn 48352 -noru -out=ttxt -unixts 0 -delay 1454292360000 2016-02-01_BR_Globo_Test_01.mpg -ucla
The old output looks like:-
20160201020600.000|20160201020600.002|NAISDB|MERGULHO. A GENTE ESTÁ REVENDO A
20160201020600.000|20160201020600.002|NAISDB|JOGADA, O LANCE É TODO DO
20160201020600.000|20160201020600.002|NAISDB|ARNALDO CÉZAR
20160201020600.000|20160201020600.002|NAISDB|MERGULHO. A GENTE ESTÁ REVENDO A
20160201020600.000|20160201020600.002|NAISDB|JOGADA, O LANCE É TODO DO
20160201020600.000|20160201020600.002|NAISDB|ARNALDO CÉZAR COELHO
20160201020600.000|20160201020600.002|NAISDB|JOGADA, O LANCE É TODO DO
20160201020600.000|20160201020600.002|NAISDB|ARNALDO CÉZAR COELHO
20160201020600.000|20160201020600.002|NAISDB|JOGADA, O LANCE É TODO DO
20160201020600.000|20160201020600.002|NAISDB|ARNALDO CÉZAR COELHO
20160201020600.000|20160201020600.002|NAISDB|>>
20160201020600.000|20160201020600.002|NAISDB|JOGADA, O LANCE É TODO DO
20160201020600.000|20160201020600.002|NAISDB|ARNALDO CÉZAR COELHO
20160201020600.000|20160201020600.002|NAISDB|>> VEJA
20160201020600.000|20160201020600.002|NAISDB|JOGADA, O LANCE É TODO DO
20160201020600.000|20160201020600.002|NAISDB|ARNALDO CÉZAR COELHO
20160201020600.000|20160201020600.002|NAISDB|>> VEJA QUE O DANIEL CHEGA
20160201020600.000|20160201020600.002|NAISDB|ATRASADO, A QUEDA FOI
20160201020600.000|20160201020600.002|NAISDB|ESPETACULAR, MAS QUE HOUVE A
20160201020600.000|20160201020600.002|NAISDB|FALTA HOUVE, E CARTÃO AMARELO
20160201020600.000|20160201020600.002|NAISDB|BEM APLICADO.
20160201020600.000|20160201020600.002|NAISDB|>> [LUIS ROBERTO] MAIS UMA VEZ,
20160201020600.000|20160201020600.002|NAISDB|AGORA POR UM OUTRO ÂNGULO,
20160201020600.000|20160201020600.002|NAISDB|PORQUE FICA AQUELA... VAMOS LÁ
20160201020600.000|20160201020600.002|NAISDB|SE O ATACANTE NÃO VAI, MAS É TÃO
20160201020600.000|20160201020600.002|NAISDB|RÁPIDA A CHEGADA DO DANIEL E A
20160201020600.000|20160201020600.002|NAISDB|FALTA É ÓTIMA PARA O NENÊ. NENÊ
20160201020600.000|20160201020600.002|NAISDB|EXÍMIO COBRADOR DE FALTAS, MAS
20160201020600.000|20160201020600.002|NAISDB|ME ATREVO A DIZER QUE É MAIS
20160201020600.000|20160201020600.002|NAISDB|PARTIDO SOCIAL LIBERAL GOSTA
20160201020600.000|20160201020600.002|NAISDB|MAIS, ELE NÃO É
20160201020600.000|20160201020600.002|NAISDB|PELA BATIDA TÃO FORTE
The new output with 'NA' removed and timestamps incremented looks like:-
20160201020600.000|20160201020600.002|ISDB|MERGULHO. A GENTE ESTÁ REVENDO A
20160201020600.000|20160201020600.002|ISDB|JOGADA, O LANCE É TODO DO
20160201020600.000|20160201020600.002|ISDB|ARNALDO CÉZAR
20160201020600.000|20160201020600.289|ISDB|MERGULHO. A GENTE ESTÁ REVENDO A
20160201020600.000|20160201020600.289|ISDB|JOGADA, O LANCE É TODO DO
20160201020600.000|20160201020600.289|ISDB|ARNALDO CÉZAR COELHO
20160201020600.289|20160201020600.520|ISDB|JOGADA, O LANCE É TODO DO
20160201020600.289|20160201020600.520|ISDB|ARNALDO CÉZAR COELHO
20160201020600.520|20160201020601.041|ISDB|JOGADA, O LANCE É TODO DO
20160201020600.520|20160201020601.041|ISDB|ARNALDO CÉZAR COELHO
20160201020600.520|20160201020601.041|ISDB|>>
20160201020601.041|20160201020601.388|ISDB|JOGADA, O LANCE É TODO DO
20160201020601.041|20160201020601.388|ISDB|ARNALDO CÉZAR COELHO
20160201020601.041|20160201020601.388|ISDB|>> VEJA
20160201020601.388|20160201020604.626|ISDB|JOGADA, O LANCE É TODO DO
20160201020604.626|20160201020606.361|ISDB|ARNALDO CÉZAR COELHO
20160201020606.361|20160201020606.766|ISDB|>> VEJA QUE O DANIEL CHEGA
20160201020606.766|20160201020611.566|ISDB|ATRASADO, A QUEDA FOI
20160201020611.566|20160201020613.533|ISDB|ESPETACULAR, MAS QUE HOUVE A
20160201020611.566|20160201020613.533|ISDB|FALTA HOUVE, E CARTÃO AMARELO
20160201020613.533|20160201020616.945|ISDB|BEM APLICADO.
20160201020616.945|20160201020619.432|ISDB|>> [LUIS ROBERTO] MAIS UMA VEZ,
20160201020619.432|20160201020622.613|ISDB|AGORA POR UM OUTRO ÂNGULO,
20160201020622.613|20160201020624.232|ISDB|PORQUE FICA AQUELA... VAMOS LÁ
20160201020624.232|20160201020628.859|ISDB|SE O ATACANTE NÃO VAI, MAS É TÃO
20160201020628.859|20160201020630.999|ISDB|RÁPIDA A CHEGADA DO DANIEL E A
20160201020630.999|20160201020633.023|ISDB|FALTA É ÓTIMA PARA O NENÊ. NENÊ
20160201020633.023|20160201020634.411|ISDB|EXÍMIO COBRADOR DE FALTAS, MAS
20160201020634.411|20160201020636.898|ISDB|ME ATREVO A DIZER QUE É MAIS
20160201020634.411|20160201020636.898|ISDB|PARTIDO SOCIAL LIBERAL GOSTA
20160201020634.411|20160201020636.898|ISDB|MAIS, ELE NÃO É
20160201020636.898|20160201020645.457|ISDB|PELA BATIDA TÃO FORTE
Although I realize that this fix is not the completely correct solution yet since the value of the end timestamp is just taken as the beginning value of the next subtitle timestamp, but it fixes the problem of the timestamps not incrementing, and gives a much better output than what we had before.
@Abhinav95 commented on GitHub (Mar 18, 2016):
Turns out the previous pull request broke valid samples and fixed broken samples.
New pull request #336 takes care of both (however is still a temporary fix to the problem of the global timestamp not being initialized).
New fixed timestamps for a previously broken file:-
20160201020600.000|20160201020600.520|ISDB|MERGULHO. A GENTE ESTÁ REVENDO A
20160201020600.520|20160201020604.453|ISDB|JOGADA, O LANCE É TODO DO
20160201020600.520|20160201020604.453|ISDB|ARNALDO CÉZAR COELHO
20160201020600.520|20160201020604.453|ISDB|>> VEJA QUE O DANIEL
20160201020604.453|20160201020604.626|ISDB|ARNALDO CÉZAR COELHO
20160201020604.453|20160201020604.626|ISDB|>> VEJA QUE O DANIEL CHEGA
20160201020604.626|20160201020606.361|ISDB|>> VEJA QUE O DANIEL CHEGA
20160201020604.626|20160201020606.361|ISDB|ATRASADO, A QUEDA FOI
20160201020606.361|20160201020606.766|ISDB|ATRASADO, A QUEDA FOI
20160201020606.361|20160201020606.766|ISDB|ESPETACULAR,
20160201020606.766|20160201020606.940|ISDB|ATRASADO, A QUEDA FOI
20160201020606.766|20160201020606.940|ISDB|ESPETACULAR, MAS
20160201020606.940|20160201020607.287|ISDB|ATRASADO, A QUEDA FOI
20160201020606.940|20160201020607.287|ISDB|ESPETACULAR, MAS QUE
20160201020607.287|20160201020608.154|ISDB|ATRASADO, A QUEDA FOI
20160201020607.287|20160201020608.154|ISDB|ESPETACULAR, MAS QUE HOUVE
@cfsmp3 commented on GitHub (Nov 9, 2016):
@Abhinav95 what's the current status? If still not perfect please add a code-in task :-)
@cfsmp3 commented on GitHub (Jan 20, 2017):
GSOC qualification: This issue gives 3 points.
@cfsmp3 commented on GitHub (Jan 12, 2018):
Closing since there's no updates - if issues still present please reopen @Liontooth or @Abhinav95