Segmentation-Free Korean Handwriting Recognition Using Neural Network Training

Segmentation-free Bangla/Korean handwriting

recognition using Neural Network training

by Steven Kim, Nishatul Majid, Advisor: Dr. Elisa H. Barney Smith

Department of Electrical and Computer Engineering and Department of Computer Science

INTRODUCTION

• The purpose of this research is to recognize handwriting. We

implemented the method for Bangla/Korean. This research work

has been submitted to the International Conference of Frontier in

Handwriting Recognition (ICFHR) 2020.

• Image recognition is often used for transcription, where storing

data as a file is much more reliable than data written in a tangible

object.

• In order to achieve high accuracy and productivity, we

implemented a neural network named VGG-16, which provides

Faster R-CNN and transfer learning.

• The Bangla network uses two individual networks in order to

detect the char

acter and the diacritics.

• The Korean network uses one network to detect the Hangul

characters

Figure 1. Example of Bangla Network

Detection

Figure 2. Example of Korean Network

Detection

Image detection tools

Faster R-CNN

• Detects images of different sizes of boxes as figures 1 and 2.

• An efficient method for detecting objects that are overlapped with

another object, which is often found within different types of

handwriting.

VGG-16

• A pre-trained network which was converted into a character

detecting network.

• Efficiently creates a network instead of building the network from

scratch.

• Has 16 layers of convolutional network, developed by the Oxford

research team VGG.

PE92 Korean

Handwriting Database

Our Team used the PE92 Database that contains various

compositions of handwritten Korean Characters. This was collected

by POSTECH, funded by ETRI in 1992.

The Dataset contains 2350 classes with about 100 samples per

class.

• Korean handwriting recognition was known as a difficult problem

due to the existence of various compositions and patterns

• So far, researcher In-Jung Kim appli

ed a convolutional neural

network and achieved 92.92% accuracy using PE92.

Figure 3. Example classes and samples

Manual/Automated Tagging

In order to teach a network how to recognize different objects, a

process of tagging is required to draw a bounding box around each

character and label its class. Covering every Korean composition is

crucial, since we want the machine to recognize every possible case;

Overall, we have manually tagged:

• 133 classes of different compositions

• 1468 samples

In the meantime, Dr. Barney Smith created an automated tagging

framework. This framework uses existing tagged groundtruth in order

to produce more groundtruth, which allowed us to obtain 133 classes

with about 100 samples each.

Figure 4. Example of manual tagging and automated tagging process

K-Net

Using the manual/automated tagging, we have created a network

called “K-Net”. K-Net detects each individual characters and

comes up with a compound of letter. In this research, we took into

account of the individual and the compound detection accuracy, to

make sure that the network is creating the combination in correct

order.

Figure 5. Example of K-Net detection

Training Parameters

• All images were resized to 600 pixels at their smallest dimension

during training.

• Stochastic Gradient Descent with Momentum (SGDM) was used

for traini

ng.

• The initial Learning Rate was set to 0.001 and maximum number

of epochs to 10.

• Overlap ratios up-to 0.6 were used for negative training.

• Number of region proposals to randomly sample from was set to

64.

• Increasing the number of epochs, regions, or decreasing the

learning rate usually makes the training process better but slower.

Training/Testing

Environment (R2 Cluster)

Using VGG-16 implies the fact that a cheap laptop without an

external graphics card may take quite some time to finish the job.

However, thanks to R2 Computer Cluster provided by the Research

Computing Department, we were able to efficiently divide

computational power by simply uploading the network and running it

on the server.

Testing Result

Since we used a subset of the whole database, we were able to

finish this process quickly, but with some loss of accuracy. Using

the autonomous tagging, we got results of JRA (Jamo

Recognition Accuracy) of 91.22%, and SRA (Syllable Recognition

Accuracy) of 84.66%.

Figure 6. Result comparison data from article Autonomous Data Tagging for Offline Handwriting

Recognition: Tested with Bangla and Korean Scripts written by Nishatul Majid and Elisa H Barney Smith

Conclusion

This project is a derivative work from Nishatul Majid’s framework of

Bangla Offline Handwriting Recognition. This is a new direction of

approach for him.

Most languages write side to side, but Korean forms individual

letters using different compositions then write from left to right.

Our maximum recognition accuracy was 84.66%. We haven’t done

any post processing or finished implementing the whole dataset yet,

which opens up more possibilities of this framework may be able to

get a higher accuracy.

Reference

1. N. Majid and E. H. Barney Smith, “Segmentation-Free Bangla

Offline Handwriting Recognition using Sequential Detection of

Characters and Diacritics with a Faster R-CNN” in International

Conference on Document Analysis and Recognition (ICDAR),

September 2019.

2. Park, Gyu-Ro, In-Jung Kim, and Cheng-Lin Liu. "An evaluation of

statistical methods in handwritten hangul recognition." International

Journal on Document Analysis and Recognition (IJDAR) 16.3

(2013): 273-283.

3. Kim, In-Jung, and Xiaohui Xie. "Handwritten Hangul recognition

using deep convolutional neural networks." International Journal on

Document Analysis and Recognition (IJDAR) 18.1 (2015): 1-13.