Transferring OCR results to DjVu?

Discussion about CuneiForm

Transferring OCR results to DjVu?

Сообщение monday2000 » Пн сен 28, 2009 4:33 pm

It would be nice to adopt CuneiForm to OCR the DjVu files. Exactly to say - to convert the OCR results to a format of the DjVu-TXT layer.

CuneiForm Batch subprogram - available in the English-interface distribution http://www.cuneiform.ru/downloads/setup ... rm_eng.exe - produces optionally the so-called FED-files (that have the *.FED extension).

Screenshot:

Изображение

Choose "NewShortcut5" - it's the link to the Batch subprogram in the CuneiForm package. (weird name??? some developer's mistake probably, in Russian version it's named "Batch OCR").

When you open the Batch subprogram, create a new batch:

Изображение

Setting the new batch, choose "FED files" at some point:

Изображение

Run your Batch OCR task. After the end you will see the FED-files in the output folder.

The FED format contains the raw OCR information.

I have translated the original FED format documentation ( http://www.cuneiform.ru/downloads/doc.zip , subfolder "ced") from Russian to English. It is available on my site here:

The description of the ED-format (written 2.11.1998)
http://www.djvu-soft.narod.ru/ed_discr_en.htm

The description of the ED-format ver.2000 (written 16.09.1999)
http://www.djvu-soft.narod.ru/new_ed_en.htm

The idea is to create a some kind FED-> djvused-txt converter.

See also the djvused doc: http://djvu.sourceforge.net/doc/man/djvused.html#lbAV (paragraph "Hidden text syntax")

Who could do that?
monday2000
 
Сообщения: 74
Зарегистрирован: Вт дек 25, 2007 6:34 pm
Откуда: Ростов-на-Дону

Вернуться в Discussion about CuneiForm

Кто сейчас на конференции

Сейчас этот форум просматривают: нет зарегистрированных пользователей и гости: 1