[Question] Looking for Hangul Frequency table

글쓴이: iswhite / 작성시간: 월, 2007/03/19 - 11:27오전

Hello Kldp,

My korean is very poor so I'll write this post in English. Sorry.

I want to port a handwriting input method that can read hangul.
To help increase input speed, I want to do predictive tests that guess what the user is writing and what he will write.
Does anyone here know of any database or table that can predict what the person wants to write given what he already wrote?

Forums:

프로그래밍 QnA

댓글 달기

you means the system

글쓴이: 익명사용자 / 작성시간: 월, 2007/03/19 - 12:12오후

you means the system recognizeses handwriting hangul from user input?, hm.. it is about a pattern maching mechanism? hm.. about hangul,,, that's very interesting.

답글

Yes, pattern matching system

글쓴이: iswhite / 작성시간: 월, 2007/03/19 - 12:23오후

Yes, pattern matching system for the korean language, not for the actual characters themselves.
For example, if I write 안녕. It could predict that I want to say 안녕하세요 or 안녕히가세요, or maybe based on the context it thinks I should have written 안경.

답글

AFAIK, that is very common

글쓴이: ohhara / 작성시간: 월, 2007/03/19 - 12:35오후

AFAIK, that is very common way the user inputs Japanese or Chinese. (Are you Japanese or Chinese? ^^)
But it's not common in Korean. Korean input method doesn't use language prediction technique even in the mobile phone.

But I am not sure that all Korean input method doesn't use that kind of technique at all. Maybe some of them use.

I think some other IME professional answer your question.

good luck. :)

Taeho Oh ( ohhara@postech.edu , ohhara@plus.or.kr ) http://ohhara.sarang.net
Postech ( Pohang University of Science and Technology ) http://www.postech.edu
Digital Media Professionals Inc. http://www.dmprof.com

Taeho Oh ( ohhara@postech.edu ) http://ohhara.sarang.net
Postech ( Pohang University of Science and Technology ) http://www.postech.edu
Alticast Corp. http://www.alticast.com

답글

Simply hangul character

글쓴이: slomo / 작성시간: 월, 2007/03/19 - 2:13오후

Simply hangul character statistics may be helpful. But I guess you're looking for hangul n-gram statistics. Or you want to get korean n-gram word statistics?

There are recognizers, spellers, translators, tokenizers, etc based on the n-gram model. Researchers and organizations have their own freqeuncy tables. I'm afraid, however, you could not find an open downloadable n-gram freq table. I don't know any. I recommend you personally ask the researchers for the data.

Try to contact:

Kaist built various korean corpra and they wrote some articles of your topic using the corpra. Others uses kaist corpra or built a corpus based on them. See "Character Corpus" at http://kibs.kaist.ac.kr/english/expert.htm. These corpra are available to public (I hope).

Prof. Seung-Shik Kang (http://nlp.kookmin.ac.kr/~sskang/) wrote an articles on n-gram based model. He could give data (if you have luck) or information where you can get one.

====
No one asks you for change or directions.
-- Slo-Mo, J. Krokidas

답글

댓글 달기

이름

제목

댓글 *

텍스트 포맷에 대한 자세한 정보

텍스트 양식

Filtered HTML

텍스트에 BBCode 태그를 사용할 수 있습니다. URL은 자동으로 링크 됩니다.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>
다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.

BBCode

텍스트에 BBCode 태그를 사용할 수 있습니다. URL은 자동으로 링크 됩니다.
다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param>
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.

Textile

다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
You can use Textile markup to format text.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>

Markdown

다음 태그를 이용하여 소스 코드 구문 강조를 할 수 있습니다: <code>, <blockcode>, <apache>, <applescript>, <autoconf>, <awk>, <bash>, <c>, <cpp>, <css>, <diff>, <drupal5>, <drupal6>, <gdb>, <html>, <html5>, <java>, <javascript>, <ldif>, <lua>, <make>, <mysql>, <perl>, <perl6>, <php>, <pgsql>, <proftpd>, <python>, <reg>, <spec>, <ruby>. 지원하는 태그 형식: <foo>, [foo].
Quick Tips:
- Two or more spaces at a line's end = Line break
- Double returns = Paragraph
- *Single asterisks* or _single underscores_ = Emphasis
- **Double** or __double__ = Strong
- This is [a link](http://the.link.example.com "The optional title text")
For complete details on the Markdown syntax, see the Markdown documentation and Markdown Extra documentation for tables, footnotes, and more.
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.
사용할 수 있는 HTML 태그: <p><div><span><br><a><em><strong><del><ins><b><i><u><s><pre><code><cite><blockquote><ul><ol><li><dl><dt><dd><table><tr><td><th><thead><tbody><h1><h2><h3><h4><h5><h6><img><embed><object><param><hr>

Plain text

HTML 태그를 사용할 수 없습니다.
web 주소와/이메일 주소를 클릭할 수 있는 링크로 자동으로 바꿉니다.
줄과 단락은 자동으로 분리됩니다.

CAPTCHA

이것은 자동으로 스팸을 올리는 것을 막기 위해서 제공됩니다.

부 메뉴

[Question] Looking for Hangul Frequency table

you means the system

Yes, pattern matching system

AFAIK, that is very common

Simply hangul character

댓글 달기

Filtered HTML

BBCode

Textile

Markdown

Plain text

주 메뉴

둘러보기

부 메뉴

현재 위치

[Question] Looking for Hangul Frequency table

you means the system

Yes, pattern matching system

AFAIK, that is very common

Simply hangul character

댓글 달기

Filtered HTML

BBCode

Textile

Markdown

Plain text

주 메뉴

검색 폼

둘러보기

사용자 로그인

Oauth2 Login :