mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2026-02-03 21:23:48 +00:00
[PR #491] [CLOSED] Split to sentences implementation #1306
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/CCExtractor/ccextractor/pull/491
Author: @maxkoryukov
Created: 12/8/2016
Status: ❌ Closed
Base:
master← Head:master📝 Commits (8)
eff5fe0Proto for tests and proto for the sentence bufc260158Tests were added02915e4Sentence buffer : some slight changes8239932Break to sentences works good.57b397dFix build. No extra comments, removed extraincludeinstructions808d1f6SBS: First working version with dup detection071b751Additional tests for SBSf1afd74Final solution for sentence breaker📊 Changes
10 files changed (+1051 additions, -285 deletions)
View changed files
📝
.gitignore(+6 -0)📝
src/lib_ccx/ccx_encoders_common.c(+180 -183)📝
src/lib_ccx/ccx_encoders_common.h(+9 -11)📝
src/lib_ccx/ccx_encoders_splitbysentence.c(+413 -91)➕
src/lib_ccx/debug_def.h(+11 -0)➕
tests/Makefile(+59 -0)➕
tests/README.md(+43 -0)➕
tests/ccx_encoders_splitbysentence_suite.c(+305 -0)➕
tests/ccx_encoders_splitbysentence_suite.h(+4 -0)➕
tests/runtest.c(+21 -0)📄 Description
Hello!
This PR contains the implementation of Sentence Buffer: Split
Usage:
Currently, it works only with
sub->type == CC_BITMAP. Implementation details - in comments to the PR.Long example
New output
Old output
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.