Kohei Arai & Herman Tolle
Method for Real Time Text Extraction of Digital Manga Comic
Kohei Arai arai@is.saga-u.ac.jp
Information Science Department
Saga University
Saga, 840-0027, Japan
Herman Tolle emang@ub.ac.id
Software Engineering Department
Brawijaya University
Malang, 65145, Indonesia
Abstract
Manga is one of popular item in Japan and also in the rest of the world.
Hundreds of manga printed everyday in Japan and some of printed manga book
was digitized into web manga. People then make translation of Japanese
language on manga into other language -in conventional way- to share the
pleasure of reading manga through the internet. In this paper, we propose an
automatic method for detect and extract Japanese character within a manga
comic page for online language translation process. Japanese character text
extraction method is based on our comic frame content extraction method using
blob extraction function. Experimental results from 15 comic pages show that our
proposed method has 100% accuracy of flat comic frame extraction and comic
balloon detection, and 93.75% accuracy of Japanese character text extraction.
Keywords: E-comic, Manga, Image Analysis, Text Extraction, Text Recognition
>>공학 쪽 논문인듯.. 인터넷으로 만화 책 볼 때 캐릭터 뽑아주는 툴 개발? 근데 이 툴 어따 쓰려고
>>이런;;;; 캐릭터가 그 인물 캐릭터가 아니라 글자 캐릭터였엌ㅋㅋㅋㅋㅋㅋㅋ;;;;; 뒤에 가서야 알아차림
1. INTRODUCTION
Manga is one of popular item in Japan and also in the rest of the world. Hundreds of manga book
is printed everyday in Japan, and some of printed manga book is digitized into web content for
reading comic through the internet. People then make translation of Japanese language in manga
into other language to share enjoy of reading manga for non Japanese reader. However, people
make translation of the text on printed comic book (they call it scanlation) in manually because
there is no automatic method for translate comic text image into any other language. The
challenge in extracting Japanese character in manga is how to detect comic balloon and extract
text in vertical direction as Japanese classic writing direction is top down and right to left.
>>아항~ 지금은 이미지 형태로 되어 있어서 이거 글자로 번역하려면 인간이 따로 보고 해야 하는데
사실 이걸 자동적으로 인지해서 번역해주는 프로그램이 있음 좋겠지.
이를 위해서 일단 캐릭터만 뽑아내는 게 잘 되는지 해보면
나중에 말풍성만 뽑아내고, 글자만 뽑아내는 걸로 확장해 갈 수 있을 것 같은 건가!ㅇㅇ
>>아니얔ㅋㅋ;;; 그 캐릭터가 글자 캐릭터라곸ㅋㅋㅋ
>>이야 당연히 문자 뽑아내는 거니까 이 논문의 의의와 효용성은 엄청나겠네
In [5], propose the concept of automatic mobile content
conversion using semantic image analysis that include comic text extraction, but this paper did
not explain the details for text extraction.
The conventional method assuming extraction process in offline
way and using scanned comic image. In the internet and mobility era, we need advance method
for extraction text in online way and automatically make translation using online translation
feature on internet like Google language translation.
>>ㅇㅇㅇㅇ 이 논문이 지향하는 의의! 목적!
This research work is improvement of our previous research on
automatic e-comic content adaptation [7] that designed for extraction comic content from existing
comic image in comic web portal and adapting it to mobile content for reading on mobile phone.
Figure 4.a show the
>>shows 써야 하는 거 아닌가..
>>뭔가.. 흑백 처리로 한 다음에 뽑아내는 거인듯
blob
(작은) 방울; (작은) 색깔 부분
<<와~ 시그마인지 뭔지 해독할 수 없는 수식 나온다~~
Although our base method for balloon detection and text extraction has performed good results
on correctly extraction, it still has a lot of false detection. In balloon detection method, there are a
lot of non balloon detected as balloon candidates. It causes by the form of comic image that looks
like balloon and pass our selection criteria. To reducing false detection, we make a simple
modification for balloon detection criteria by adding one process. The new process is performing
text detection on balloon candidate. If a candidate of text detected inside a balloon candidate,
then classify it as a new balloon candidate. If there is no text blob candidate detected inside a
balloon, then classify it as a non balloon. Implementing of this modified method, reduce about
90% of false detection. Figure 6 shows the sample of false detection of balloon candidate, which
eliminated on text detection. Actually, the occurrence of false detection in text extraction process
is not a serious problem because in the next process while we implement OCR for extract text
from text blob image, the false detected of balloon or text produces nothing after OCR
processing. In the other hand, the accuracy of text extraction method is 93.75% while 6.25% of
texts are not detected or system fail to detect and extract. Most of the failure on text extraction is
cause by non standard of position of the character within a balloon as shown in Figure 6.b.
System also fail to extract 2 or more text column that close to another.
>>연구의 문제점
- 글자가 말풍선 안에 있는 거 말고도 있는 데 그건 어떡할 거냐
- 말풍선 모양의 그림은 어떻게 분간할 거냐
등
'from논to문' 카테고리의 다른 글
How to Make A WebToon (1) (0) | 2014.04.14 |
---|---|
[R/ㄲ] 장르융합형 지식만화의 만화시장 창출과정에 관한 연구 / 이용훈 (1) (0) | 2014.04.14 |
[W/참고] TV 유아교육 프로그램에 나타난 문제 해결 방법에 관한 연구 / 권금상 (2) (0) | 2014.04.13 |
300편 달성!! + 점수기준 (0) | 2014.04.12 |
만화에서 사각형 칸의 일반화 원인에 대한 분석 / 김종희 (100/100) (0) | 2014.04.12 |