Issues with ISDB (Brazil) support #115

Open
opened 2026-01-29 16:35:42 +00:00 by claunia · 0 comments
Owner

Originally created by @cfsmp3 on GitHub (Feb 20, 2016).

Originally assigned to: @Abhinav95 on GitHub.

All seem easy to solve (so possibly a good chance for GSOC applicants to get points).

On the Brazilian files, the text is great (ignore the failure to show utf8 properly), but the timestamps don't increment -- and the second timestamp doesn't respond to the -delay switch:

ccextractor -datets -pn 48352 -noru -out=ttxt -unixts 0 -delay 1454292360000 -o 2016-02-01_0206_BR_Globo_Test_02.ccx2.out /tv/2016/2016-02/2016-02-01/2016-02-01_0206_BR_Globo_Test_02.mpg

csa@esfinge:/tv/2016/2016-02/2016-02-01$ cat 2016-02-01_0206_BR_Globo_Test_02.ccx2.out 
20160201020600,000|19700101000000,002|ISDB|PARTICIPOU DAS PRINCIPAIS
20160201020600,000|19700101000000,002|ISDB|JOGADAS OFENSIVAS DO... OUTRO
20160201020600,000|19700101000000,002|ISDB|TOQUE, E DEPOIS ELE ENTRA PELA
20160201020600,000|19700101000000,002|ISDB|JOGADAS OFENSIVAS DO... OUTRO
20160201020600,000|19700101000000,002|ISDB|TOQUE, E DEPOIS ELE ENTRA PELA
20160201020600,000|19700101000000,002|ISDB|ESQUERDA, OLHA L� O ESPA�O QUE
20160201020600,000|19700101000000,002|ISDB|TINHA O JORGE HENRIQUE PARA
20160201020600,000|19700101000000,002|ISDB|TENTAR DAR UM TOQUE. E NEN�
20160201020600,000|19700101000000,002|ISDB|AINDA FAZ O GESTO, P�, PODIA TER
20160201020600,000|19700101000000,002|ISDB|TOCADO PARA MIM.
20160201020600,000|19700101000000,002|ISDB|>> [LUIS ROBERTO] LANCES DO
20160201020600,000|19700101000000,002|ISDB|PRIMEIRO TEMPO EM S�O JANEIRO
20160201020600,000|19700101000000,002|ISDB|PASSO AIR, A COLINA HIST�RICA,

If I include the -UCLA switch, I do get the offset on the second timestamp:

ccextractor -datets -pn 48352 -UCLA -noru -ttxt -unixts 0 -delay 1454292360000 -o $FIL.ccx3.out /tv/2016/2016-02/2016-02-01/$FIL.mpg

but no incremental time, so this appears to be the only actual problem.

I also get 'NA' in the third column. We no longer use the teletext page number that is added to the third column by the -UCLA switch so it should be removed.

I've placed the file on http://vrnewsscape.ucla.edu/dropbox/2016-02-01_BR_Globo_Test_01.mpg

Originally created by @cfsmp3 on GitHub (Feb 20, 2016). Originally assigned to: @Abhinav95 on GitHub. All seem easy to solve (so possibly a good chance for GSOC applicants to get points). On the Brazilian files, the text is great (ignore the failure to show utf8 properly), but the timestamps don't increment -- and the second timestamp doesn't respond to the -delay switch: ccextractor -datets -pn 48352 -noru -out=ttxt -unixts 0 -delay 1454292360000 -o 2016-02-01_0206_BR_Globo_Test_02.ccx2.out /tv/2016/2016-02/2016-02-01/2016-02-01_0206_BR_Globo_Test_02.mpg csa@esfinge:/tv/2016/2016-02/2016-02-01$ cat 2016-02-01_0206_BR_Globo_Test_02.ccx2.out  20160201020600,000|19700101000000,002|ISDB|PARTICIPOU DAS PRINCIPAIS 20160201020600,000|19700101000000,002|ISDB|JOGADAS OFENSIVAS DO... OUTRO 20160201020600,000|19700101000000,002|ISDB|TOQUE, E DEPOIS ELE ENTRA PELA 20160201020600,000|19700101000000,002|ISDB|JOGADAS OFENSIVAS DO... OUTRO 20160201020600,000|19700101000000,002|ISDB|TOQUE, E DEPOIS ELE ENTRA PELA 20160201020600,000|19700101000000,002|ISDB|ESQUERDA, OLHA L� O ESPA�O QUE 20160201020600,000|19700101000000,002|ISDB|TINHA O JORGE HENRIQUE PARA 20160201020600,000|19700101000000,002|ISDB|TENTAR DAR UM TOQUE. E NEN� 20160201020600,000|19700101000000,002|ISDB|AINDA FAZ O GESTO, P�, PODIA TER 20160201020600,000|19700101000000,002|ISDB|TOCADO PARA MIM. 20160201020600,000|19700101000000,002|ISDB|>> [LUIS ROBERTO] LANCES DO 20160201020600,000|19700101000000,002|ISDB|PRIMEIRO TEMPO EM S�O JANEIRO 20160201020600,000|19700101000000,002|ISDB|PASSO AIR, A COLINA HIST�RICA, If I include the -UCLA switch, I do get the offset on the second timestamp: ccextractor -datets -pn 48352 -UCLA -noru -ttxt -unixts 0 -delay 1454292360000 -o $FIL.ccx3.out /tv/2016/2016-02/2016-02-01/$FIL.mpg but no incremental time, so this appears to be the only actual problem. I also get 'NA' in the third column. We no longer use the teletext page number that is added to the third column by the -UCLA switch so it should be removed. I've placed the file on http://vrnewsscape.ucla.edu/dropbox/2016-02-01_BR_Globo_Test_01.mpg
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#115