Brazilian ISDB ignores levdistmincnt, levdistmaxpct, and unixts #307

Closed
opened 2026-01-29 16:40:33 +00:00 by claunia · 4 comments
Owner

Originally created by @Liontooth on GitHub (May 28, 2017).

Originally assigned to: @Abhinav95 on GitHub.

Please prefix your issue with one of the following: [BUG]

CCExtractor version (using the --version parameter preferably) : 0.85 (latest zip file from github)

In raising this issue, I confirm the following (please check boxes, eg [X]):

  • I have read and understood the contributors guide.
  • I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • I have checked that the issue I'm posting isn't already reported.
  • I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • I have used the latest available version of CCExtractor to verify this issue exists.

See related issues:

It's possible that the -pn and -unixts errors I'm seeing are regressions. It's also possible that the files are subtly different. The earlier test files were recorded with a PixelView Play TV USB SBTVD Full Seg stick. The attached file was recorded with the new Brazilian HDHomeRun device.

My familiarity with the project is as follows (check one, eg [X]):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Necessary information

  • Is this a regression (did it work before)? [x] NO | [ ] YES - please specify the last known working version
  • What platform did you use? [ ] Windows - [x] Linux - [ ] Mac
  • What where the used arguments?
    -datets -ttxt -UCLA -noru -utf8 -levdistmincnt 2 -levdistmaxpct 10 -unixts 1495845901

Video links
http://vrnewsscape.ucla.edu/dropbox/2017-05-27_0045_BR_Record_Jornal_da_Record.mpg

Please make the affected input file available for us (no screenshots, those don't help!). Public links to Dropbox, Google Drive, etc, are all fine. If it is not possible to make it available publicly, send us a private invitation (both Dropbox and Google Drive allow that). In this case we will download the file and upload it to the private developer repository.

Do not upload your file to any location that will require us to sign up or endure a wait list, slow downloads, etc. If your upload expires make sure you keep it active somehow (replace links if needed). Keep in mind that while we go over all tickets some may take a few days, and it's important we have the file available when we actually need it.

Additional information

{issue content here, replace this line with your issue content}

CCExtractor 0.85 does a great job with Brazilian ISDB captions, but there are four problems with switches:

  1. If the argument "-pn $PN" is used, ccextractor does not recognize the file to be ISDB. Not a serious bug, since everything works fine if the argument is removed, but not expected behavior.

  2. The -unixts argument is ignored. For instance, using -unixts 1495845901 with -datets and -UCLA produces this output:

19700101000000.901|19700101000103.494|ISDB|>> POR FAVOR, POR
19700101000103.494|19700101000105.273|ISDB|>> POR FAVOR, POR AQUI,

The unix epoch offset is not working. This is the most serious bug.

  1. The argument "-levdistmincnt" is ignored. For instance, using -levdistmincnt 2 produces this output:

19700101000000.901|19700101000103.494|ISDB|>> POR FAVOR, POR
19700101000103.494|19700101000105.273|ISDB|>> POR FAVOR, POR AQUI,
19700101000105.273|19700101000107.002|ISDB|>> POR FAVOR, POR AQUI,
19700101000105.273|19700101000107.002|ISDB|RAINHA!
19700101000107.002|19700101000108.480|ISDB|RAINHA!
19700101000107.002|19700101000108.480|ISDB|>> QUE LUGAR HORRÍVEL, TEM
19700101000108.480|19700101000109.758|ISDB|>> QUE LUGAR HORRÍVEL, TEM
19700101000108.480|19700101000109.758|ISDB|CERTEZA QUE É AQUI?
19700101000109.758|19700101000111.336|ISDB|CERTEZA QUE É AQUI?

Most lines are duplicated; deduplication is not happening.

  1. The argument -levdistmaxpct to help deduplicate also appears to be ignored.

Cheers,
Dave

Originally created by @Liontooth on GitHub (May 28, 2017). Originally assigned to: @Abhinav95 on GitHub. Please prefix your issue with one of the following: [BUG] CCExtractor version (using the --version parameter preferably) : 0.85 (latest zip file from github) **In raising this issue, I confirm the following (please check boxes, eg [X]):** - [x] I have read and understood the [contributors guide](https://github.com/CCExtractor/ccextractor/blob/master/.github/CONTRIBUTING.md). - [x] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present. - [x] I have checked that the issue I'm posting isn't already reported. - [x] I have checked that the issue I'm porting isn't already solved and no duplicates exist in [closed issues](https://github.com/CCExtractor/ccextractor/issues?q=is%3Aissue+is%3Aclosed) and in [opened issues](https://github.com/CCExtractor/ccextractor/issues) - [x] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion. - [x] I have used the latest available version of CCExtractor to verify this issue exists. See related issues: - https://github.com/CCExtractor/ccextractor/pull/336 - https://github.com/CCExtractor/ccextractor/pull/334 - https://github.com/CCExtractor/ccextractor/issues/284 It's possible that the -pn and -unixts errors I'm seeing are regressions. It's also possible that the files are subtly different. The earlier test files were recorded with a PixelView Play TV USB SBTVD Full Seg stick. The attached file was recorded with the new Brazilian HDHomeRun device. **My familiarity with the project is as follows (check one, eg [X]):** - [ ] I have never used CCExtractor. - [ ] I have used CCExtractor just a couple of times. - [ ] I absolutely love CCExtractor, but have not contributed previously. - [x] I am an active contributor to CCExtractor. **Necessary information** - Is this a regression (did it work before)? [x] NO | [ ] YES - *please specify the last known working version* - What platform did you use? [ ] Windows - [x] Linux - [ ] Mac - What where the used arguments? -datets -ttxt -UCLA -noru -utf8 -levdistmincnt 2 -levdistmaxpct 10 -unixts 1495845901 **Video links** http://vrnewsscape.ucla.edu/dropbox/2017-05-27_0045_BR_Record_Jornal_da_Record.mpg Please make the affected input file available for us (no screenshots, those don't help!). Public links to Dropbox, Google Drive, etc, are all fine. If it is not possible to make it available publicly, send us a private invitation (both Dropbox and Google Drive allow that). In this case we will download the file and upload it to the private developer repository. Do *not* upload your file to any location that will require us to sign up or endure a wait list, slow downloads, etc. If your upload expires make sure you keep it active somehow (replace links if needed). Keep in mind that while we go over all tickets some may take a few days, and it's important we have the file available when we actually need it. **Additional information** {issue content here, replace this line with your issue content} CCExtractor 0.85 does a great job with Brazilian ISDB captions, but there are four problems with switches: 1. If the argument "-pn $PN" is used, ccextractor does not recognize the file to be ISDB. Not a serious bug, since everything works fine if the argument is removed, but not expected behavior. 2. The -unixts argument is ignored. For instance, using -unixts 1495845901 with -datets and -UCLA produces this output: 19700101000000.901|19700101000103.494|ISDB|>> POR FAVOR, POR 19700101000103.494|19700101000105.273|ISDB|>> POR FAVOR, POR AQUI, The unix epoch offset is not working. This is the most serious bug. 3. The argument "-levdistmincnt" is ignored. For instance, using -levdistmincnt 2 produces this output: 19700101000000.901|19700101000103.494|ISDB|>> POR FAVOR, POR 19700101000103.494|19700101000105.273|ISDB|>> POR FAVOR, POR AQUI, 19700101000105.273|19700101000107.002|ISDB|>> POR FAVOR, POR AQUI, 19700101000105.273|19700101000107.002|ISDB|RAINHA! 19700101000107.002|19700101000108.480|ISDB|RAINHA! 19700101000107.002|19700101000108.480|ISDB|>> QUE LUGAR HORRÍVEL, TEM 19700101000108.480|19700101000109.758|ISDB|>> QUE LUGAR HORRÍVEL, TEM 19700101000108.480|19700101000109.758|ISDB|CERTEZA QUE É AQUI? 19700101000109.758|19700101000111.336|ISDB|CERTEZA QUE É AQUI? Most lines are duplicated; deduplication is not happening. 4. The argument -levdistmaxpct to help deduplicate also appears to be ignored. Cheers, Dave
Author
Owner

@Abhinav95 commented on GitHub (Jun 20, 2017):

I will take a look at this around the end of this week. I presume it should not be a difficult fix but let us see.

@Abhinav95 commented on GitHub (Jun 20, 2017): I will take a look at this around the end of this week. I presume it should not be a difficult fix but let us see.
Author
Owner

@Abhinav95 commented on GitHub (Jul 6, 2017):

For future reference:-

  • -unixts 1495845901 can be made to work by turning it into -unixts 0 -delay 1495845901000. Basically multiply the unix timestamp by 1000 and pass it to the delay flag.
  • #746 fixes duplication in ISDB when -noru is passed
@Abhinav95 commented on GitHub (Jul 6, 2017): For future reference:- - `-unixts 1495845901` can be made to work by turning it into `-unixts 0 -delay 1495845901000`. Basically multiply the unix timestamp by 1000 and pass it to the delay flag. - #746 fixes duplication in ISDB when -noru is passed
Author
Owner

@Abhinav95 commented on GitHub (Jul 6, 2017):

@Liontooth Can this issue be closed now? levdist is not needed in ISDB since other deduplication measures are in place. The required timestamps can be obtained by the workaround I mentioned.

@Abhinav95 commented on GitHub (Jul 6, 2017): @Liontooth Can this issue be closed now? levdist is not needed in ISDB since other deduplication measures are in place. The required timestamps can be obtained by the workaround I mentioned.
Author
Owner

@Liontooth commented on GitHub (Oct 16, 2017):

The issues appear to be resolved, so I'm closing. Thank you!

@Liontooth commented on GitHub (Oct 16, 2017): The issues appear to be resolved, so I'm closing. Thank you!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#307