[Question] Looking for Hangul Frequency table
글쓴이: iswhite / 작성시간: 월, 2007/03/19 - 11:27오전
Hello Kldp,
My korean is very poor so I'll write this post in English. Sorry.
I want to port a handwriting input method that can read hangul.
To help increase input speed, I want to do predictive tests that guess what the user is writing and what he will write.
Does anyone here know of any database or table that can predict what the person wants to write given what he already wrote?
Forums:
you means the system
you means the system recognizeses handwriting hangul from user input?, hm.. it is about a pattern maching mechanism? hm.. about hangul,,, that's very interesting.
Yes, pattern matching system
Yes, pattern matching system for the korean language, not for the actual characters themselves.
For example, if I write 안녕. It could predict that I want to say 안녕하세요 or 안녕히가세요, or maybe based on the context it thinks I should have written 안경.
AFAIK, that is very common
AFAIK, that is very common way the user inputs Japanese or Chinese. (Are you Japanese or Chinese? ^^)
But it's not common in Korean. Korean input method doesn't use language prediction technique even in the mobile phone.
But I am not sure that all Korean input method doesn't use that kind of technique at all. Maybe some of them use.
I think some other IME professional answer your question.
good luck. :)
Taeho Oh ( ohhara@postech.edu , ohhara@plus.or.kr ) http://ohhara.sarang.net
Postech ( Pohang University of Science and Technology ) http://www.postech.edu
Digital Media Professionals Inc. http://www.dmprof.com
Taeho Oh ( ohhara@postech.edu ) http://ohhara.sarang.net
Postech ( Pohang University of Science and Technology ) http://www.postech.edu
Alticast Corp. http://www.alticast.com
Simply hangul character
Simply hangul character statistics may be helpful. But I guess you're looking for hangul n-gram statistics. Or you want to get korean n-gram word statistics?
There are recognizers, spellers, translators, tokenizers, etc based on the n-gram model. Researchers and organizations have their own freqeuncy tables. I'm afraid, however, you could not find an open downloadable n-gram freq table. I don't know any. I recommend you personally ask the researchers for the data.
Try to contact:
Kaist built various korean corpra and they wrote some articles of your topic using the corpra. Others uses kaist corpra or built a corpus based on them. See "Character Corpus" at http://kibs.kaist.ac.kr/english/expert.htm. These corpra are available to public (I hope).
Prof. Seung-Shik Kang (http://nlp.kookmin.ac.kr/~sskang/) wrote an articles on n-gram based model. He could give data (if you have luck) or information where you can get one.
====
No one asks you for change or directions.
-- Slo-Mo, J. Krokidas
====
No one asks you for change or directions.
-- Slo-Mo, J. Krokidas
댓글 달기