mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-13 21:22:29 +00:00
Garbled output in some Tivo samples #38
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @cfsmp3 on GitHub (Mar 1, 2015).
Originally assigned to: @rkuchumov on GitHub.
Output is garbled in some (but not all) recordings from Tivo.
3 samples can be checked out here:
https://drive.google.com/folderview?id=0B3bPKNXgZu0-fjAxWFN2YXJSSFdZSlpRYllPSDBxTk9xUlU4dDZiUllxRE5kZXp1cEpSX2c
@Akirato commented on GitHub (Mar 7, 2015):
Hey, the codebase is large so I am not yet able to pinpoint the error.
But the error follows a pattern.
In each output two characters move after four places.
eg: In the Outrageous Acts of Science
THE SWINNG H[GI]AMMOCK -> THE SWIN[GI]{NG H}AMMOCK
IS ACTUALLY VERYIMIL[ S]AR -> IS ACTUALLY VERY[ S]{IMIL}AR
and so on.....
Same could be observed in all the samples.
Hope this helps.
@anshul1912 commented on GitHub (Mar 8, 2015):
I did regression test, this error is from 0.71, from starting
@uajain commented on GitHub (Mar 8, 2015):
@anshul1912 I ran the samples with v0.71 and v0.70. Both of them gives garbled output.
P.S. - (being a beginner) by regression test, you mean that this feature was fine before and broke in v0.71 ?
@anshul1912 commented on GitHub (Mar 8, 2015):
@mailumangjain thats what I was trying to say, it is broken from start
@uajain commented on GitHub (Mar 8, 2015):
So we have a new format which CCextractor does not support ? then what is the methodology to support this ?
@anshul1912 commented on GitHub (Mar 8, 2015):
these files are supported, but there is some bug in code, which need to be taken care.
That format is supported but these files have something different in it, which ends up things jumbled.
If file is not supported then it would and should say its not supported
@uajain commented on GitHub (Mar 8, 2015):
Any suggestions what might be? Because I've gone nuts tracking down this bug, I see the buffer, it contains garbled output, but from where it is creeping is not what i am getting.
@Abhinav95 commented on GitHub (Mar 8, 2015):
I just tried running the sample with v0.69.
The output is still garbled.
On Sun, Mar 8, 2015 at 2:45 PM, Umang Jain notifications@github.com wrote:
@Akirato commented on GitHub (Mar 8, 2015):
Can there actually be a problem in the ccx_decoders? They are the one's filling the buffers right....
@dwhe commented on GitHub (Mar 19, 2015):
I can confirm that I have witnessed this issue with CCextractor many times with that exact same pattern. I am just a user, however, so I cannot do much more other than share my user experience with this app and confirm that I am experiencing it too. Hopefully this is helpful in some small way.
@canihavesomecoffee commented on GitHub (Mar 21, 2015):
@dwhe It would be even better if you could share some samples, or provide some more details (like input format, used parameters, etc.) :)
@vivanishin commented on GitHub (Mar 21, 2015):
Hey!
I don't get it: as far as I can see, the input is garbled (and sure enough the output is equally garbled). I delete .srt, open a video file in vlc, select subtitle track = closed captions 1 and see the exact same mutilated captions.
Is there something I don't understand everyone else here does?
Or I just shouldn't trust vlc either because it has same bugs?)
@canihavesomecoffee commented on GitHub (Mar 21, 2015):
VLC is not always correct either ;)
A sample can be indeed corrupt sometimes, but it can also happen that both VLC (for example) and CCExtractor have a bug in the processing of the code.
To determine if the caption data (input) is good, it might be interesting to analyze it deep-level, and compare it to what the specs are definining as correct order/behaviour.
@vivanishin commented on GitHub (Mar 21, 2015):
OK, thank you.
Are there any useful tools? For example are there available software CC encoders?
Or just the ccextractor's decoders themselves and the specs?)
@dwhe commented on GitHub (Mar 28, 2015):
I can describe the input - I download the video from my Tivo units (I have a Series3 and a Premiere units), and convert them either using KMTTG or or cTivo, both of which processes involve the use of ccextractor. The garbling described earlier occurs with both Tivo units and both KMTTG and cTivo.
However, until this thread was created, I always thought that the garbling was the fault of the transfer process that occurred between the Tivo units to my mac - either caused by the Tivo OS or by the two software apps (KMTTH & cTivo) that downloaded the videos from the Tivo units. I thought I would pipe in because the garbling description is exactly what I have been experiencing.
It is very possible that the garbling is not caused by ccextractor but I found it interesting that my experience is identical to what was described. It is also possible that the identical garbling just happens to be identical and not necessarily related to OP’s issue
Do let me know if you’d like any additional details from me.
@arantius commented on GitHub (Apr 11, 2015):
@dwhe is correct. The TiVo is the problem.
I've done more tests. The TiVo can trasfer either PS or TS. One is faster, one is "more reliable". I've always done PS because it worked more consistently for me. Today I transferred a file as PS like I normally do, and got garbled captions. Transfer the same file as TS and the captions are OK. (Sadly, I have many dozens of archived files already transferred as PS. Ah well.)
If whatever garbling has happened to the input can still be handled correctly, I'd be a very happy user! But right now this looks like a GIGO issue.
@cfsmp3 commented on GitHub (Apr 11, 2015):
If Tivo is able to play the PS and display captions correctly then it follows that it's possible to extract correct CC from a Tivo PS.
The TS and PS code in CCExtractor is different, so correct TS and broken PS might be an issue in CCExtractor.
It would be helpful to get a PS and a TS for the same recording - that should help us figure out what's different.
@arantius commented on GitHub (Apr 11, 2015):
I've added four short clips to the original URL linked above, both in PS and TS format, with those indicators in the names. The extracted captions are all OK for the TS format files, and all exhibit the described arbitrarily missing/transposed two character issues. (Actually, this time it looks like they're most/all missing, not really ever transposed.)
@rkuchumov commented on GitHub (May 8, 2016):
I've checked all PS files. I was searching for 0xB24741393403 bytes which means the beginning of caption blocks (picture user data with CC according to A/53 Part 4, chapter 6.2.2). There were no other blocks with CC.
In files Deadliest.. , Dirty.., NASA.., Redrum CC blocks with text were missing in the places where they must be. So, the problem is likely to be with files generated by Tivo. By the way, when I convert TS files to PS using VLC captions are fine.
In files Family.., How.., Outran.. a lot of CC are in the wrong order, and only a few CC blocks with text are missing. So, maybe control blocks are missing or something or the problem is with CCExtractor. I need to do more investigations.
test.mpg works fine with "-2" parameter.
@rkuchumov commented on GitHub (May 11, 2016):
Files Family.., How.., Outran.. are quite odd. For some reason there is a wrong order of pictures at the end of picture sequence.
For example, we have (the first column is the file offset):
After sorting by temporal reference it yields: "IS ACTUALLY VERYIMIL S". which is exactly the output of CCExtractor. But, if we shift the last picture by 2 positions, i.e. ..., 8, 9, 12, 10, 11 it'll yield the correct output (IS ACTUALLY VERY SIMIL).
The same happens throughout all 3 files. I didn't find any description for that in specs (maybe I've missed something).
Any ideas?
@cfsmp3 commented on GitHub (May 11, 2016):
My only suggestion -for now- is to get more samples and see if it happens
with all or just some of them .
Maybe the TiVo owners in this thread can supply us with more? Also both the
PS and TS versions so we can reach some conclusion by comparing.
And -since I'm asking for stuff- URLs to download the program that does
that conversion :-)
@arantius commented on GitHub (May 15, 2016):
I grabbed five more arbitrary recordings, each right about one minute a piece, each in both TS and PS format. This was done by navigating to the TiVo's web interface and following the download links.
For each PS I ran "tivodecode" ( https://github.com/arantius/tivodecode ) to produce the decrypted mpeg. For each TS I passed it through "DirectShow Dump" (paired with TiVo Desktop; both included in the folder linked below). This is because tivodecode is much easier to use, but unable to process TS files. Unfortunately I can't share the decryption key so that's not of much use to you.
Here's the resulting files:
https://drive.google.com/open?id=0B3bPKNXgZu0-Ri1MZ0lyMllzSzg
In these cases the PS files have obvious issues; the TS files seem good as well ... Until you get to sample E. In this case the TS is completely empty, and the PS still contains obvious issues ("TO PREVENT THE INGDIENRETS", "AFTER EA ICHNGREDIENT", "THE TANK'S CONNTTES", "QUALITY-CONTROL STINTEG.", "THE THICKNESS OF THEAINT P." In this case VLC is able to play
e.tsand display all the subtitles correctly without error (though it does have issues withe.ps, worse than ccextractor does).I hope this helps. Also: the files I actually care about all happen to come from the channel that sample E came from.
@rkuchumov commented on GitHub (May 18, 2016):
Thanks for the samples.
I've done the same. There are caption blocks missing in all PS files. But in B and D, as I described earlier, after shifting the last picture header and corresponding CC block by 2 positions the output is fine. In file E, sometimes the offset should be 1 position.
Also I found that in files a.ps.tivo, b.ps, and especially in e.ps.tivo captions blocks are present and in the correct order, but they are not displayed by CCExtractor. These blocks are also at the end of GOP. Maybe it's somehow related to the previous problem or maybe CCExtractor doesn't flush caption buffer.
Files .ts.Tivo doesn't work at all. I can't play them in VLC either. But they have caption blocks :)
@rkuchumov commented on GitHub (May 21, 2016):
Nope, in these cases the last CC block contains PAC which sets cursor position. As it's misplaced, new captions overwrite the previous ones, so they are not displayed.
So, these errors follows the same pattern. Either it's defined in specs or the bug is somewhere outside of CCExtractor.
@arantius Did you write tivodecode?
@arantius commented on GitHub (May 24, 2016):
No, I did not write tivodecode.
@canihavesomecoffee commented on GitHub (May 24, 2016):
@rkuchumov Original project is here: https://sourceforge.net/projects/tivodecode/
Seems to be abandoned.
@ghost commented on GitHub (Nov 28, 2016):
Error is still reproducable with outrageous acts of science as of V 0.82. Same pattern occurs.
@mackworth commented on GitHub (Nov 28, 2016):
There is a TS compatible next-generation version of it here:
https://github.com/wmcbrine/tivodecode-ng
Despite the warning comment; it works great.
@cfsmp3 commented on GitHub (Nov 28, 2016):
Test needs to be done against the last version (github master) - 0.79 is
old :-)
On Mon, Nov 28, 2016 at 2:54 PM, Alex Huang notifications@github.com
wrote:
@ghost commented on GitHub (Nov 28, 2016):
Tested again with .82. Still gives the same error with the same pattern.
@Izaron commented on GitHub (Dec 25, 2016):
Well, if they both are bad with this videos (VLC and CCExtractor), then for whatever reason, the subtitles are displayed on link too bad? It turns out that TiVo is good, and all the other bad?
@cfsmp3 commented on GitHub (Jan 20, 2017):
GSOC qualification: This issue gives 3 points.
@thefar8 commented on GitHub (Jan 16, 2018):
tried debug mode to resolve this issue but still generates same pattern too
@arantius commented on GitHub (Jan 16, 2018):
At this point I strongly suspect GIGO.
@thefar8 commented on GitHub (Jan 16, 2018):
Using valgrind
Probably @arantius is right
@cfsmp3 commented on GitHub (Jan 16, 2018):
You should always use the debug version on valgrid, other we're missing
useful information such as line numbers.
Anyway, one 22 bytes block is probably not a memory leak. We're just not
cleaning everything up before terminating.
On Tue, Jan 16, 2018 at 7:08 AM, Theodore Fabian Rudy <
notifications@github.com> wrote:
@thefar8 commented on GitHub (Jan 16, 2018):
oops
i forgot to mention that i also used debug command on those proces
On 17 Jan 2018 02:05, "Carlos Fernandez Sanz" notifications@github.com
wrote:
@cfsmp3 commented on GitHub (Jan 17, 2018):
That's not the output of valgrind running on a version with debug symbols,
really :-)
On Tue, Jan 16, 2018 at 3:56 PM, Theodore Fabian Rudy <
notifications@github.com> wrote:
@thefar8 commented on GitHub (Jan 17, 2018):
https://github.com/dscottbuch/cTiVo/issues/26
found this. Maybe if it helps
@thefar8 commented on GitHub (Jan 17, 2018):
The command i used for the process above was
valgrind --leak-check=full --show-leak-kinds=all ccextractor OutrageousScience.mpg -debug@thefar8 commented on GitHub (Jan 17, 2018):
I think the problem existed in general_loop.c (because the ps data grabber existed here). I also have already process the (I have sent the valgrind result with debug command in issue).
VLC also show the same subtitles as CCExtractor (I have already checked most of them).
The other solution(maybe) is decode strangeheader from
// TiVo is also a PS if (ctx->startbytes[0]=='T' && ctx->startbytes[1]=='i' && ctx->startbytes[2]=='V' && ctx->startbytes[3]=='o') { // The TiVo header is longer, but the PS loop will find the beginning dbg_print(CCX_DMT_PARSE, "detect_stream_type: detected as Tivo PS\n"); ctx->startbytes_pos = 187; ctx->stream_mode = CCX_SM_PROGRAM; ctx->strangeheader = 1; // Avoid message about unrecognized header( Can be found in stream_functions.c )
@cfsmp3 commented on GitHub (Nov 20, 2018):
This has been open for a really long time. Closing. If someone posts fresh samples we'll revisit, but I guess there's no point otherwise. Don't know if people still use Tivo and if the problem still exists?
@poetnerd commented on GitHub (Jan 22, 2022):
I would like to re-open this issue. Although I'm new to the ccextractor community, and may need a bit of tutelage in how to provide the most useful data, I'm rather obsessive about running bugs to ground.
I have encountered the problem where ccextractor drops pairs of characters from closed captions in files
fetched from TiVo.
The insight about trying Transport vs. Program stream shows me pretty clearly that the two streams
give radically different output.
I saw a discussion thread that said transport streams were unreliable for captions, and program streams were preferable. My experience is the opposite -- that the program stream was missing bazillions of instances of pairs of characters dropped, and whole captions missing compared to the transport stream.
The particular file I used in my testing is rather large. If someone will supply a clue on how to just provide a short excerpt (I.E. how to truncate the file without completely breaking it,) I'll supply an excerpt.
Here is the workflow I used:
Program Stream:
cTiVo download in format "Decrypted TiVo Show"; With "Don't delete temporaries" I get the SRT file.
Or run ccextractor of the delivered .mpg file
Transport Stream:
Running ccextractor on the program stream seemed quite happy while the run against the transport stream spit out a lot of complaints:
Run against program stream:
Partial output from run against transport stream:
Short excerpts of the resulting .srt files
Program Stream:
Transport Stream:
Diff output:
@poetnerd commented on GitHub (Jan 22, 2022):
Oh critical data I forgot to supply -- version information:
wdc-home-3:pending wdc$ ccextractor --version
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
CCExtractor detailed version info
Version: 0.94
Git commit: Unknown
Compilation date: 2022-01-16
CEA-708 decoder: C
File SHA256: Could not open file
Libraries used by CCExtractor
libGPAC Version: 1.0.1
zlib: 1.2.11
utf8proc Version: 2.4.0
protobuf-c Version: 1.3.1
libpng Version: 1.6.37
FreeType
libhash
nuklear
libzvbi