GSOC - File analysis funcionality #24

Closed
opened 2026-01-29 16:33:09 +00:00 by claunia · 7 comments
Owner

Originally created by @cfsmp3 on GitHub (Apr 17, 2014).

Originally assigned to: @rkuchumov on GitHub.

We need a feature that -using everything that is already in place- consumes part of a stream (up to a limit specified by the user) and generates an easy to parse report.

The limit can be time (for example, the first minute of the file), size (such as the first 10 MB), or until something is found (for example if captions are found, stop).

The report will be text sent to stdout, and contain things like this:
File: ...........
AnyCC608: Yes
AnyCC708: No
Programs: 3
PrimaryLanguagePresent: Yes
SecondaryLanguagePresent: No
XDSPresent: Yes

and so on

This functionality is easy to add, since all info is already in the internal status. We just need the ability to display it in an easy to parse format.

Originally created by @cfsmp3 on GitHub (Apr 17, 2014). Originally assigned to: @rkuchumov on GitHub. We need a feature that -using everything that is already in place- consumes part of a stream (up to a limit specified by the user) and generates an easy to parse report. The limit can be time (for example, the first minute of the file), size (such as the first 10 MB), or until something is found (for example if captions are found, stop). The report will be text sent to stdout, and contain things like this: File: ........... AnyCC608: Yes AnyCC708: No Programs: 3 PrimaryLanguagePresent: Yes SecondaryLanguagePresent: No XDSPresent: Yes and so on This functionality is easy to add, since all info is already in the internal status. We just need the ability to display it in an easy to parse format.
claunia added the enhancementhelp wanted labels 2026-01-29 16:33:09 +00:00
Author
Owner

@cfsmp3 commented on GitHub (May 28, 2014):

I'm trying to assign this to both Ruslan and Willem but it looks like I can't.

Anyway:
Ruslan => Changes in CCExtractor
Willem => Consume the output in a way that is useful for the test code

@cfsmp3 commented on GitHub (May 28, 2014): I'm trying to assign this to both Ruslan and Willem but it looks like I can't. Anyway: Ruslan => Changes in CCExtractor Willem => Consume the output in a way that is useful for the test code
Author
Owner

@canihavesomecoffee commented on GitHub (May 28, 2014):

Looks interesting, but would it be possible to clarify this a little?
The way I'm understanding it now is:

  1. run CCExtractor just as normal, but add an extra parameter to it.
  2. Instead of outputting the normal info, it shows the info you put in the first post
  3. This data should go into a report

Did I understand this correctly, or did you have it envisioned in another way?

@canihavesomecoffee commented on GitHub (May 28, 2014): Looks interesting, but would it be possible to clarify this a little? The way I'm understanding it now is: 1) run CCExtractor just as normal, but add an extra parameter to it. 2) Instead of outputting the normal info, it shows the info you put in the first post 3) This data should go into a report Did I understand this correctly, or did you have it envisioned in another way?
Author
Owner

@cfsmp3 commented on GitHub (May 28, 2014):

That's pretty much it. The idea is that since CCExtractor already figures
out most of the information we make it easy to make use of that stuff for
external programs that want the information but not the subtitles. I get
emails from time to time from people who needs to know if a file has
captions but don't care about the captions themselves, for example.

On Wed, May 28, 2014 at 9:54 PM, wforums notifications@github.com wrote:

Looks interesting, but would it be possible to clarify this a little?
The way I'm understanding it now is:

  1. run CCExtractor just as normal, but add an extra parameter to it.
  2. Instead of outputting the normal info, it shows the info you put in the
    first post
  3. This data should go into a report

Did I understand this correctly, or did you have it envisioned in another
way?


Reply to this email directly or view it on GitHubhttps://github.com/CCExtractor/ccextractor/issues/17#issuecomment-44456134
.

@cfsmp3 commented on GitHub (May 28, 2014): That's pretty much it. The idea is that since CCExtractor already figures out most of the information we make it easy to make use of that stuff for external programs that want the information but not the subtitles. I get emails from time to time from people who needs to know if a file has captions but don't care about the captions themselves, for example. On Wed, May 28, 2014 at 9:54 PM, wforums notifications@github.com wrote: > Looks interesting, but would it be possible to clarify this a little? > The way I'm understanding it now is: > > 1) run CCExtractor just as normal, but add an extra parameter to it. > 2) Instead of outputting the normal info, it shows the info you put in the > first post > 3) This data should go into a report > > Did I understand this correctly, or did you have it envisioned in another > way? > > — > Reply to this email directly or view it on GitHubhttps://github.com/CCExtractor/ccextractor/issues/17#issuecomment-44456134 > .
Author
Owner

@canihavesomecoffee commented on GitHub (May 28, 2014):

Ok, that looks good then. I assume I'll have to wait for Ruslan then, and after that I can alter the tester to implement it.

@canihavesomecoffee commented on GitHub (May 28, 2014): Ok, that looks good then. I assume I'll have to wait for Ruslan then, and after that I can alter the tester to implement it.
Author
Owner

@cfsmp3 commented on GitHub (May 29, 2014):

Yes. Just discuss with Ruslan the most convenient output format for you :-)

On Wed, May 28, 2014 at 10:18 PM, wforums notifications@github.com wrote:

Ok, that looks good then. I assume I'll have to wait for Ruslan then, and
after that I can alter the tester to implement it.


Reply to this email directly or view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/17#issuecomment-44458906
.

@cfsmp3 commented on GitHub (May 29, 2014): Yes. Just discuss with Ruslan the most convenient output format for you :-) On Wed, May 28, 2014 at 10:18 PM, wforums notifications@github.com wrote: > Ok, that looks good then. I assume I'll have to wait for Ruslan then, and > after that I can alter the tester to implement it. > > — > Reply to this email directly or view it on GitHub > https://github.com/CCExtractor/ccextractor/issues/17#issuecomment-44458906 > .
Author
Owner

@rkuchumov commented on GitHub (Jun 3, 2014):

I'll stick to time as a limit for now.
What parameter should I use to print reports? -out=report or --report or something else? There is -endate for time.
If we output report to stdout then we should mute other printed info, right? Maybe just leave file name as separator in case there are several input files.
As for report content: Some of useful data are in global variables, other in local. I'll create a global structure with variables holding current info for the report.

@rkuchumov commented on GitHub (Jun 3, 2014): I'll stick to time as a limit for now. What parameter should I use to print reports? -out=report or --report or something else? There is -endate for time. If we output report to stdout then we should mute other printed info, right? Maybe just leave file name as separator in case there are several input files. As for report content: Some of useful data are in global variables, other in local. I'll create a global structure with variables holding current info for the report.
Author
Owner

@canihavesomecoffee commented on GitHub (Jun 3, 2014):

I think the out parameter is the most interesting, since that one clearly indicates that there will be no other output generated. If we'd use --report it might be unclear to others.

And yes, other info should be suppressed, only that info should be displayed. For multiple files I'd go with the file name as separator, or a blank line.

@canihavesomecoffee commented on GitHub (Jun 3, 2014): I think the out parameter is the most interesting, since that one clearly indicates that there will be no other output generated. If we'd use --report it might be unclear to others. And yes, other info should be suppressed, only that info should be displayed. For multiple files I'd go with the file name as separator, or a blank line.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#24