[PROPOSAL] Extract subtitles in a Chinese newscast #379

Closed
opened 2026-01-29 16:42:25 +00:00 by claunia · 7 comments
Owner

Originally created by @Liontooth on GitHub (Jan 24, 2018).

Originally assigned to: @Abhinav95 on GitHub.

The following video file was was recorded in mainland China, using Joker-tv with the DTMB television standard. When I watch it, I'm seeing what looks like subtitles / captions. Can CCExtractor see them? I was not able to get it to work. I did not try OCR, which may be what is required.

http://vrnewsscape.ucla.edu/dropbox/2018-01-09_2033_CN_CCTV1_%e6%96%b0%e9%97%bb1+1.mpg

Cheers,
David

Originally created by @Liontooth on GitHub (Jan 24, 2018). Originally assigned to: @Abhinav95 on GitHub. The following video file was was recorded in mainland China, using Joker-tv with the DTMB television standard. When I watch it, I'm seeing what looks like subtitles / captions. Can CCExtractor see them? I was not able to get it to work. I did not try OCR, which may be what is required. http://vrnewsscape.ucla.edu/dropbox/2018-01-09_2033_CN_CCTV1_%e6%96%b0%e9%97%bb1+1.mpg Cheers, David
claunia added the difficulty: hardOCRGSoC-relatedDTMBJokerTV labels 2026-01-29 16:42:25 +00:00
Author
Owner

@jimboH commented on GitHub (Feb 20, 2018):

I would like to work on this issue.

@jimboH commented on GitHub (Feb 20, 2018): I would like to work on this issue.
Author
Owner

@saurabhshri commented on GitHub (Feb 20, 2018):

Sure, just go ahead.

On 20-Feb-2018 10:00 PM, "jimboH" notifications@github.com wrote:

I would like to work on this issue.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/CCExtractor/ccextractor/issues/918#issuecomment-367034799,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AL1y1E0531qHlWaeyQIFmOF_BslT0gfzks5tWvMKgaJpZM4RsBWz
.

@saurabhshri commented on GitHub (Feb 20, 2018): Sure, just go ahead. On 20-Feb-2018 10:00 PM, "jimboH" <notifications@github.com> wrote: > I would like to work on this issue. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <https://github.com/CCExtractor/ccextractor/issues/918#issuecomment-367034799>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AL1y1E0531qHlWaeyQIFmOF_BslT0gfzks5tWvMKgaJpZM4RsBWz> > . >
Author
Owner

@thealphadollar commented on GitHub (Apr 11, 2018):

@Liontooth @cfsmp3

The subtitles in the given video file can be extracted using hardsubx parameter.

./ccextractor 2018-01-09_2033_CN_CCTV1_新闻1+1.mpg -hardsubx -ocrlang chi_sim

To display them in the video player one requires compatible fonts but they are indeed being extracted and below is an image showing the same.

screenshot from 2018-04-11 21-21-17

There are errors in the output due to inaccuracies of the OCR but they are out of the scope of this issue and is a separate GSoC project.

@thealphadollar commented on GitHub (Apr 11, 2018): @Liontooth @cfsmp3 The subtitles in the given video file can be extracted using hardsubx parameter. ``` ./ccextractor 2018-01-09_2033_CN_CCTV1_新闻1+1.mpg -hardsubx -ocrlang chi_sim ``` To display them in the video player one requires compatible fonts but they are indeed being extracted and below is an image showing the same. ![screenshot from 2018-04-11 21-21-17](https://user-images.githubusercontent.com/32812320/38628309-aecebc7a-3dce-11e8-96c2-2f31a3c4867d.png) There are errors in the output due to inaccuracies of the OCR but they are out of the scope of this issue and is a separate GSoC project.
Author
Owner

@cfsmp3 commented on GitHub (Apr 11, 2018):

Assigned to @Abhinav95 which in turn will assign it to the GSoC student(s) he sees fit.

@cfsmp3 commented on GitHub (Apr 11, 2018): Assigned to @Abhinav95 which in turn will assign it to the GSoC student(s) he sees fit.
Author
Owner

@fewwwww commented on GitHub (Mar 8, 2021):

Just browsing GSoC issues, looking to work on the flutter project but I think I can help out a little bit with this issue.

Here is the document for the global standard of DTMB (GB20600-2006), 130 pages all in Chinese:
https://www.doc88.com/p-810688531386.html
Here is a patent of a hardware that is able to separate audio and video signals and display subtitles on DTMB equipment:
https://nxgp.cnki.net/kcms/detail?v=kxaUMs6x7-4I2jr5WTdXti3zQ9F92xu0N5Lim4gHJeVFMNAZBuVUfzvmz2LuJgb7bn8rlgaJH4AQ98pqdK9FqNLQT3L2E_Cs&uniplatform=NZKPT
Here is tons of recordings of DTMB televisions on Bilibili (can be downloaded by tools like you-get):
https://search.bilibili.com/all?keyword=dtmb&from_source=nav_search_new

Feel free to move on or maybe write proposals with all these vital links. I really like to work on this issue, but for this issue, the solver definitely needs to know two languages: C and Chinese. I happen to know about Chinese but not much C.

@fewwwww commented on GitHub (Mar 8, 2021): Just browsing GSoC issues, looking to work on the flutter project but I think I can help out a little bit with this issue. Here is the document for the global standard of DTMB (GB20600-2006), 130 pages all in Chinese: https://www.doc88.com/p-810688531386.html Here is a patent of a hardware that is able to separate audio and video signals and display subtitles on DTMB equipment: https://nxgp.cnki.net/kcms/detail?v=kxaUMs6x7-4I2jr5WTdXti3zQ9F92xu0N5Lim4gHJeVFMNAZBuVUfzvmz2LuJgb7bn8rlgaJH4AQ98pqdK9FqNLQT3L2E_Cs&uniplatform=NZKPT Here is tons of recordings of DTMB televisions on Bilibili (can be downloaded by tools like you-get): https://search.bilibili.com/all?keyword=dtmb&from_source=nav_search_new Feel free to move on or maybe write proposals with all these vital links. I really like to work on this issue, but for this issue, the solver definitely needs to know two languages: C and Chinese. I happen to know about Chinese but not much C.
Author
Owner

@cfsmp3 commented on GitHub (Mar 29, 2021):

@fewwwww This is great, thanks!
Let's hope there's a brave student that knows C and feels like doing this with your help :-)

@cfsmp3 commented on GitHub (Mar 29, 2021): @fewwwww This is great, thanks! Let's hope there's a brave student that knows C and feels like doing this with your help :-)
Author
Owner

@cfsmp3 commented on GitHub (Mar 22, 2023):

Closing to keep track of our Chinese wishlist here: https://github.com/CCExtractor/ccextractor/issues/224

@cfsmp3 commented on GitHub (Mar 22, 2023): Closing to keep track of our Chinese wishlist here: https://github.com/CCExtractor/ccextractor/issues/224
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/ccextractor#379