Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > 1dcef8f7b86dc3b3c7b89dd968fc4c12 > files > 23

mecab-0.96-1mdv2008.1.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=EUC-JP">
<title>MeCab: ¥ª¥ê¥¸¥Ê¥ë¼­½ñ/¥³¡¼¥Ñ¥¹¤«¤é¤Î¥Ñ¥é¥á¡¼¥¿¿äÄê</title>
<link type="text/css" rel="stylesheet" href="mecab.css">
</head>
<body>
<h1>¥ª¥ê¥¸¥Ê¥ë¼­½ñ/¥³¡¼¥Ñ¥¹¤«¤é¤Î¥Ñ¥é¥á¡¼¥¿¿äÄê</h1>

<p>$Id: learn.html 131 2007-06-09 16:18:15Z taku-ku $;</p>

<h2>³µÍ×</h2>

<p>³Ø½¬ÍÑ¥³¡¼¥Ñ¥¹¤«¤é¥Ñ¥é¥á¡¼¥¿(¥³¥¹¥ÈÃÍ)¤ò¿äÄꤹ¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹. 
MeCab ¼«¿È¤ÏÉÊ»ìÂηϤËÈó°Í¸¤ÊÀ߷פˤʤäƤ¤¤ë¤¿¤á, 
Æȼ«¤ÎÉÊ»ìÂηÏ,  ¼­½ñ,  ¥³¡¼¥Ñ¥¹¤Ë´ð¤Å¤¯²òÀÏ´ï¤òºîÀ®¤¹¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹. 
¥Ñ¥é¥á¡¼¥¿¿äÄê¤Ë¤Ï Conditinoal Random Fields (<a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a>) ¤ò»È¤Ã¤Æ¤¤¤Þ¤¹. </p> 
<p>

<h2>½èÍý¤Îή¤ì</h2>
<p>¥Ç¡¼¥¿¥Õ¥í¡¼¿Þ¤Ï¼¡¤Î¤è¤¦¤Ë¤Ê¤ê¤Þ¤¹. </p>
<img src="flow.png">

<p>¥Ñ¥é¥á¡¼¥¿¿äÄê¤Ë¤Ï°Ê²¼¤Î¥µ¥Ö¥¿¥¹¥¯¤¬¤¢¤ê¤Þ¤¹. </p>
<ul>
<li><a href="#seed">Seed¼­½ñ¤Î½àÈ÷</a>
<li><a href="#config">ÀßÄê¥Õ¥¡¥¤¥ë¤Î½àÈ÷</a>
    <ul>
     <li>dicrc
     <li>char.def
     <li>unk.def
     <li>rewrite.def
     <li>feature.def
    </ul>
<li><a href="#corpus">³Ø½¬ÍÑ¥³¡¼¥Ñ¥¹¤Î½àÈ÷</a>
<li><a href="#binary">³Ø½¬ÍѥХ¤¥Ê¥ê¼­½ñ¤ÎºîÀ®</a>
<li><a href="#crf"><a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a> ¥Ñ¥é¥á¡¼¥¿¤Î³Ø½¬</a>
 <li><a href="#dist">ÇÛÉÛÍѼ­½ñ¤ÎºîÀ®</a>
<li><a href="#test">²òÀÏÍѥХ¤¥Ê¥ê¼­½ñ¤ÎºîÀ®</a>
 <li><a href="#eval">ɾ²Á</a>
</ul>

<p>¤½¤ì¤¾¤ì½ç¤ËÀâÌÀ¤·¤Æ¤¤¤­¤Þ¤¹. </p>

<h2><a name="seed">Seed¼­½ñ¤Î½àÈ÷</a></h2>
<p>MeCab¤Î¼­½ñ¤Ï CSV ¤Çµ­½Ò¤µ¤ì¤Þ¤¹. Seed ¼­½ñ¤ÈÇÛÉÛ¼­½ñ¤Î¥Õ¥©¡¼¥Þ¥Ã
¥È¤Ï´ðËÜŪ¤ËƱ°ì¤Ç¤¹. </p>

<p>°Ê²¼¤¬¼­½ñ¤Î¥¨¥ó¥È¥ê¤ÎÎã¤Ç¤¹. </p>

<pre>
¿Ê³Ø¹»,0,0,0,̾»ì,°ìÈÌ,*,*,*,*,¿Ê³Ø¹»,¥·¥ó¥¬¥¯¥³¥¦,¥·¥ó¥¬¥¯¥³¡¼
ÇßÎñ,0,0,0,̾»ì,°ìÈÌ,*,*,*,*,ÇßÎñ,¥¦¥á¥´¥è¥ß,¥¦¥á¥´¥è¥ß
µ¤°µ,0,0,0,̾»ì,°ìÈÌ,*,*,*,*,µ¤°µ,¥­¥¢¥Ä,¥­¥¢¥Ä
¿åÃæÍãÁ¥,0,0,0,̾»ì,°ìÈÌ,*,*,*,*,¿åÃæÍãÁ¥,¥¹¥¤¥Á¥å¥¦¥è¥¯¥»¥ó,¥¹¥¤¥Á¥å¡¼¥è¥¯¥»¥ó
</pre>

<p>ºÇ½é¤Î4¥«¥é¥àÌܤޤǤÏ, ɬ¿Ü¹àÌܤÇ, </p>
<ul>
<li>ɽÁØ·Á (ñ¸ì¤½¤Î¤â¤Î)
<li>º¸Ï¢ÀܾõÂÖÈÖ¹æ
<li>±¦Ï¢ÀܾõÂÖÈÖ¹æ
<li>¥³¥¹¥È
</ul>
<p>¤È¤Ê¤Ã¤Æ¤¤¤Þ¤¹. º¸Ï¢ÀܾõÂÖÈÖ¹æ, ±¦Ï¢ÀܾõÂÖÈÖ¹æ, ¥³¥¹¥È¤Ï, Seed ¼­½ñ¤Ç¤Ï
»È¤ï¤ì¤Ê¤¤¤Î¤Ç 0 ¤È¤·¤Æ¤ª¤­¤Þ¤¹.</p>

<p>5¥«¥é¥àÌܰʹߤϡÖÁÇÀ­¡×¤È¸Æ¤Ð¤ì¤ë¹àÌܤǤ¹. MeCab ¤Ï, ¥·¥¹¥Æ¥à¤ÎÈÆÍÑÀ­
¤ò¹â¤á¤ë¤¿¤á¤Ë, ¡ÖÉÊ»ì¡×¡Ö³èÍѡסÖÆɤߡסÖȯ²»¡×¤È¤¤¤Ã¤¿¡Öñ¸ì¤ËÉÕÍ¿¤µ¤ì
¤ë¾ðÊó¡×¤ò¥·¥¹¥Æ¥à¤Ï¶èÊ̤»¤º¡ÖÁÇÀ­¡×¤È¤·¤Æ°·¤Ã¤Æ¤¤¤Þ¤¹. ¥æ¡¼¥¶¤Ï CSV ¤¬
µö¤¹¸Â¤ê²¿¸Ä¤Ç¤âÁÇÀ­¤òÉÕÍ¿¤¹¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹. ¤¿¤À¤·, ³Æ¥«¥é¥à¤ÎÁÇÀ­¤Î
ÄêµÁ¤Ï¤½¤í¤¨¤Æ¤ª¤¯É¬Íפ¬¤¢¤ê¤Þ¤¹. (5¥«¥é¥àÌܤÏÉÊ»ì, 6¥«¥é¥àÌܤÏÉÊ»ìºÆʬ
ÎàÅù) Ä̾ï, ÁÇÀ­ÈÖ¹æ¤Î¼ã¤¤¤â¤Î¤«¤é½ç¤Ë°ìÈÌŪ¤ÊÁÇÀ­¤òÎóµó¤·¤Æ¤¤¤­¤Þ¤¹. 
(Îã: ÉÊ»ì, ÉÊ»ìºÙʬÎà, ³èÍÑ·¿, ³èÍÑ·Á, ¸¶·Á, Æɤß, ȯ²»)
</p>

<p>ÁÇÀ­¤ÏÆâÉôŪ¤Ë¤ÏÇÛÎó¤È¤·¤Æ°·¤ï¤ì¤Þ¤¹. 0ÈÖÌܤÎÁÇÀ­, 1ÈÖÌܤÎÁÇÀ­.. ¤È
¤¤¤¦¸Æ¤ÓÊý¤ÇÁÇÀ­¤ò»²¾È¤¹¤ë¤³¤È¤¬¤¢¤ê¤Þ¤¹. ÁÇÀ­¤ÎÈÖ¹æ¤ÈÆâÉôɽ¸½(ÉÊ»ì, ÆÉ
¤ßÅù)¤Ï, ¥æ¡¼¥¶¼«¿È¤¬´ÉÍý¤·¤Æ¤¯¤À¤µ¤¤. </p>

<p>¾åµ­¤ÎÎã¤Ï, ipadic ¤ÎÎã¤Ç¤¹. ÁÇÀ­Îó¤È¤·¤Æ</p>
<ul>
<li>ÉÊ»ì
<li>ÉÊ»ìºÙʬÎà1
<li>ÉÊ»ìºÙʬÎà2
<li>ÉÊ»ìºÙʬÎà3
<li>³èÍÑ·¿
<li>³èÍÑ·Á
<li>´ðËÜ·Á
<li>Æɤß
<li>ȯ²»
</ul>
<p>¤¬ÄêµÁ¤µ¤ì¤Æ¤¤¤Þ¤¹. </p>

<p>MeCab ¤Ï³èÍѽèÍý¤ò¹Ô¤¤¤Þ¤»¤ó. ³èÍѤ¹¤ë¸ì¤Î¾ì¹ç¤Ï, ¥æ¡¼¥¶¤¬»öÁ°¤Ë³èÍÑ
¤òŸ³«¤¹¤ëɬÍפ¬¤¢¤ê¤Þ¤¹. 
<pre>
Ï¢¤ì½Ð¤¹,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,´ðËÜ·Á,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥¹,¥Ä¥ì¥À¥¹
Ï¢¤ì½Ð¤µ,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,̤Á³·Á,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥µ,¥Ä¥ì¥À¥µ
Ï¢¤ì½Ð¤½,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,̤Á³¥¦Àܳ,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥½,¥Ä¥ì¥À¥½
Ï¢¤ì½Ð¤·,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,Ï¢ÍÑ·Á,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥·,¥Ä¥ì¥À¥·
Ï¢¤ì½Ð¤»,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,²¾Äê·Á,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥»,¥Ä¥ì¥À¥»
Ï¢¤ì½Ð¤»,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,Ì¿Îá£å,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥»,¥Ä¥ì¥À¥»
Ï¢¤ì½Ð¤·¤ã,0,0,0,Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥µ¹Ô,²¾Äê½ÌÌó£±,Ï¢¤ì½Ð¤¹,¥Ä¥ì¥À¥·¥ã,¥Ä¥ì¥À¥·¥ã
</pre></p>

<h2><a name="config">ÀßÄê¥Õ¥¡¥¤¥ë¤Î½àÈ÷</a></h2>
<h3>dicrc</h3>
<p>
¼­½ñ¤Î¤µ¤Þ¤¶¤Þ¤ÊÆ°ºî¤ò»ØÄꤹ¤ë¥Õ¥¡¥¤¥ë¤Ç¤¹. °Ê²¼¤¬ºÇÄã¸Â¤ÎÀßÄê¤Ç¤¹. <p>

<pre>
cost-factor = 800
bos-feature = BOS/EOS,*,*,*,*,*,*,*,*
eval-size = 6
unk-eval-size = 4
config-charset = EUC-JP
</pre>

<ul>
<li>cost-factor: ¥³¥¹¥ÈÃͤËÊÑ´¹¤¹¤ë¤È¤­¤Î¥¹¥±¡¼¥ê¥ó¥°¥Õ¥¡¥¯¥¿¡¼¤Ç¤¹.
    700 ¤«¤é 800 ¤ÇÌäÂꤢ¤ê¤Þ¤»¤ó. 
<li>bos-feature: ʸƬ,  ʸËö¤ÎÁÇÀ­¤Ç¤¹. CSV ¤Çɽ¸½¤·¤Þ¤¹. 
<li>eval-size: ´ûÃθì¤Î»þ, ÁÇÀ­¤ÎÀèƬ¤«¤é²¿¸Ä¹çÃפ¹¤ì¤ÐÀµ²ò¤ÈǧÄꤹ¤ë¤«
    ¤ò»ØÄꤷ¤Þ¤¹. Ä̾ï, ´ûÃθì¤ÏÉÊ»ì, ³èÍѤȤ¤¤Ã¤¿¾ðÊó¤Î¤ß¤¬Àµ²ò¤¹¤ì¤Ð¤è¤¤¤Î¤Ç,  ¡ÖÆɤߡסÖȯ²»¡×¤È¤¤¤Ã¤¿ÁÇÀ­¤Ï̵
    »ë¤¹¤ë¤è¤¦¤Ë¤·¤Þ¤¹. ¾åµ­¤ÎÎã¤Ç¤Ï 6 ¤È¤Ê¤Ã¤Æ¤¤¤ë¤Î¤Ç, IPAÉÊ»ìÂηϤÎ
    ÉÊ»ì, ÉÊ»ìºÙʬÎà1, 2, 3, ³èÍÑ·¿, ³èÍÑ·Á ¤Î 6¤Ä¤¬É¾²Á¤µ¤ì¤Þ¤¹. 
 <li>unk-eval-size: ̤Ãθì¤Î»þ, ÁÇÀ­¤ÎÀèƬ¤«¤é²¿¸Ä¹çÃפ¹¤ì¤ÐÀµ²ò¤ÈǧÄê
    ¤¹¤ë¤«¤ò»ØÄꤷ¤Þ¤¹. 
<li>config-charset: dicrc, char.def, unk.def, pos-id.def¥Õ¥¡¥¤¥ë¤Îʸ»ú¥³¡¼¥É¤Ç¤¹.
</ul>

<h3>char.def</h3>
<p>
̤Ãθì½èÍý¤ÎÄêµÁ¥Õ¥¡¥¤¥ë¤Ç¤¹. Ä̾ïÆüËܸì¤Î·ÁÂÖÁDzòÀϤǤϻú¼ï¤Ë´ð¤Å¤¯Ì¤ÃÎ
¸ì½èÍý¤¬¹Ô¤ï¤ì¤Þ¤¹. MeCab ¤Ç¤Ï, ¤É¤Îʸ»ú¤ò¤É¤Î»ú¼ï¤È¤·¤ÆÄêµÁ¤¹¤ë¤«¤È¤¤¤Ã¤¿Àß
Äê¤òºÙ¤«¤¯»ØÄꤹ¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹. ¤µ¤é¤Ë, ³Æ»ú¼ï¤ËÂФ·, ¤É¤Î¤è¤¦¤Ê̤Ãθì
½èÍý¤ò¹Ô¤¦¤«ºÙ¤«¤¯»ØÄꤹ¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹. 
</p>

<p>
¥Õ¥¡¥¤¥ë¤ÎºÇ½é¤Ë¤Ï, ¥«¥Æ¥´¥ê̾¤ÎÄêµÁ¤È, ³Æ¥«¥Æ¥´¥ê¤Î̤Ãθì½èÍý¤ÎÆ°ºî
¤òÄêµÁ¤·¤Þ¤¹. 

<pre>
¥«¥Æ¥´¥ê̾      Æ°ºî¥¿¥¤¥ß¥ó¥°(0/1)  ¥°¥ë¡¼¥Ô¥ó¥°(0/1)  Ťµ(0,1, 2... n)
</pre>

<ul>
<li>¥«¥Æ¥´¥ê̾: ¥«¥Æ¥´¥ê¤Î̾Á°¤Ç¤¹. <br>
    HIRANA, KATAKANA.. ¤È¤¤¤Ã¤¿¥«¥Æ¥´¥ê¤òÄêµÁ¤·¤Þ¤¹.DEFAULT ¤È SPACE ¤Ïɬ¿Ü¤Î¥«¥Æ¥´¥ê¤Ç¤¹. 
<li>Æ°ºî¥¿¥¤¥ß¥ó¥°: <br>
    ¤½¤Î¥«¥Æ¥´¥ê¤Ë¤ª¤¤¤Æ, ¤¤¤Ä̤Ãθì½èÍý¤òÆ°¤«¤¹¤«¤òÄêµÁ¤·¤Þ¤¹. 
    <ul>
     <li>0: ´ûÃθ줬¤¢¤ë¾ì¹ç¤Ï, ̤Ãθì½èÍý¤òÆ°ºî¤µ¤»¤Þ¤»¤ó
     <li>1: ¾ï¤Ë̤Ãθì½èÍý¤òÆ°¤«¤·¤Þ¤¹
    </ul>
<li>¥°¥ë¡¼¥Ô¥ó¥°: ̤Ãθì¤Î¸õÊäÀ¸À®ÊýË¡¤Ç¤¹. 
    <ul>
     <li>0: Ʊ¤¸»ú¼ï¤Ç¤Þ¤È¤á¤Þ¤»¤ó. 
     <li>1: Ʊ¤¸»ú¼ï¤Ç¤Þ¤È¤á¤Þ¤¹. 
    </ul>
<li>Ťµ: ̤Ãθì¤Î¸õÊäÀ¸À®ÊýË¡¤Ç¤¹. 
    <ul>
     <li>1: 1ʸ»ú¤Þ¤Ç¤Îʸ»úÎó¤ò̤Ãθì¤È¤·¤Þ¤¹. 
     <li>2: 2ʸ»ú¤Þ¤Ç¤Îʸ»úÎó¤ò̤Ãθì¤È¤·¤Þ¤¹. <br>
	 ... 
     <li>n: nʸ»ú¤Þ¤Ç¤Îʸ»úÎó¤ò̤Ãθì¤È¤·¤Þ¤¹. <br>
     </ul>
    ¥°¥ë¡¼¥Ô¥ó¥°¤ÈŤµ¤ÏƱ»þ¤Ë»ØÄꤹ¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹. 
</ul>

<p>Îã</p>
<pre>
KANJI          0 0 2
SYMBOL         1 1 0
NUMERIC        1 1 0
ALPHA          1 1 0
HIRAGANA       0 1 2 </pre>
</p>

<p>¼¡¤Ë, ³Æ¥«¥Æ¥´¥ê¤¬UCS2¤Î¥³¡¼¥É¥Ý¥¤¥ó¥È¤Î¤É¤³¤Ë³ºÅö¤¹¤ë¤«ÄêµÁ¤·¤Þ¤¹. </p>
<pre>
codepoint ¥Ç¥Õ¥©¥ë¥È¥«¥Æ¥´¥ê̾ ¸ß´¹¥«¥Æ¥´¥ê̾1  ¸ß´¹¥«¥Æ¥´¥ê̾2 .. 
</pre>
<p>¤â¤·¤¯¤Ï,</p>
<pre>
low_codepoint..high_codepoint ¥Ç¥Õ¥©¥ë¥È¥«¥Æ¥´¥ê̾ ¸ß´¹¥«¥Æ¥´¥ê̾1  ¸ß´¹¥«¥Æ¥´¥ê̾2 .. 
</pre>

<p>Îã</p>
<pre>
0x0009 SPACE
0x30A1..0x30FF  KATAKANA
0x30FC          KATAKANA HIRAGANA  # ¡¼
</pre>
<p>¥³¡¼¥É¥Ý¥¤¥ó¥È¤Ï UCS2(Unicode)¤ò 0x ¤«¤é»Ï¤Þ¤ë16¿Ê¿ô¤Çµ­½Ò¤·¤Þ¤¹.</p>

<p>
ºÇ½é¤Î¥«¥Æ¥´¥ê¤Ï, ¤½¤Î¥³¡¼¥É¥Ý¥¤¥ó¥È¤Î¥Ç¥Õ¥©¥ë¥È¥«¥Æ¥´¥ê¤Ç¤¹. 
¤µ¤é¤Ë, ¸ß´¹¥«¥Æ¥´¥ê¤òÎóµó¤¹¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹. ¾åµ­¤ÎÎã¤Ç¤Ï, Ĺ²»µ­¹æ¡Ö¡¼¡×
¤Ï, ¥Ç¥Õ¥©¥ë¥È¤Ç¤Ï¥«¥¿¥«¥Ê¤Ç¤¹¤¬, Ê¿²¾Ì¾¤ò¸ß´¹¥«¥Æ¥´¥ê¤È¤·¤Æ»ý¤Á¤Þ¤¹. 
¥°¥ë¡¼¥Ô¥ó¥°Æ°ºî¤Î»þ¤Ë¸ß´¹¥«¥Æ¥´¥ê¤ÏƱ¤¸¥°¥ë¡¼¥×¤È¤·¤Æ¤ß¤Ê¤µ¤ì¤Þ¤¹. 
</p>

<p>°Ê²¼¤¬ char.def ¤Î¶ñÂÎÎã¤Ç¤¹.</p>
<pre>
DEFAULT        0 1 0  # DEFAULT is a mandatory category!
SPACE          0 1 0  
KANJI          0 0 2
SYMBOL         1 1 0
NUMERIC        1 1 0
ALPHA          1 1 0
HIRAGANA       0 1 2 
KATAKANA       1 1 0
KANJINUMERIC   1 1 0
GREEK          1 1 0
CYRILLIC       1 1 0

# SPACE
0x0020 SPACE  # DO NOT REMOVE THIS LINE,  0x0020 is reserved for SPACE
0x00D0 SPACE
0x0009 SPACE
0x000B SPACE
0x000A SPACE

# ASCII
0x0021..0x002F SYMBOL
0x0030..0x0039 NUMERIC

... 

# KATAKANA
0x30A1..0x30FF  KATAKANA
0x31F0..0x31FF  KATAKANA  # Small KU .. Small RO
0x30FC          KATAKANA HIRAGANA  # ¡¼
</pre>
    
<h3>unk.def</h3>
<p>
̤ÃθìÍѤμ­½ñ¤Ç¤¹. </p>
<pre>
DEFAULT,0,0,0,µ­¹æ,°ìÈÌ,*,*,*,*,*
SPACE,0,0,0,µ­¹æ,¶õÇò,*,*,*,*,*
KANJI,0,0,0,̾»ì,°ìÈÌ,*,*,*,*,*
KANJI,0,0,0,̾»ì,¥µÊÑÀܳ,*,*,*,*,*
HIRAGANA,0,0,̾»ì,°ìÈÌ,*,*,*,*,*
HIRAGANA,0,0,0,̾»ì,¥µÊÑÀܳ,*,*,*,*,*
HIRAGANA,0,0,0,̾»ì,¸Çͭ̾»ì,ÃÏ°è,°ìÈÌ,*,*,*
... 
</pre>

<p>
ɽÁؤÎÉôʬ¤ò char.def ¤ÇÄêµÁ¤·¤¿¥«¥Æ¥´¥ê̾¤È¤·¤¿¼­½ñ¥Õ¥¡¥¤¥ë¤Ç¤¹. 
³Æ¥«¥Æ¥´¥ê¤ËÂФ·¤Æ¤É¤Î¤è¤¦¤ÊÁÇÀ¸Îó¤òÉÕÍ¿¤¹¤ë¤«¤òÄêµÁ¤·¤Þ¤¹. 
1¤Ä¤Î¥«¥Æ¥´¥ê¤ËÊ£¿ô¤ÎÁÇÀ­¤òÄêµÁ¤·¤Æ¤â¤«¤Þ¤¤¤Þ¤»¤ó. ³Ø½¬¸å, ŬÀڤʥ³¥¹¥ÈÃͤ¬
¼«Æ°Åª¤ËÍ¿¤¨¤é¤ì¤Þ¤¹. 
</p>

<h3>rewrite.def</h3>
<p>
ÁÇÀ­Î󤫤éÆâÉô¾õÂÖÁÇÀ¸Îó¤ËÊÑ´¹¤¹¤ë¥Þ¥Ã¥Ô¥ó¥°¤òÄêµÁ¤·¤Þ¤¹. 
</p>

<p>
<a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a>¤Ï, unigram,  º¸Ê¸Ì® bigram,  ±¦Ê¸Ì® bigram ¤Î3¾ðÊó¤ò»È¤Ã¤ÆÅý·×¾ðÊó¤ò·×
»»¤·¤Þ¤¹. Î㤨¤Ð°Ê²¼¤Î¡ÖÈþ¤·¤¤Àî¡×¤È¤¤¤¦°Ê²¼¤ÎÎã¤Ç¤Ï, ¼­½ñ¤ËÄêµÁ¤µ¤ì¤Æ¤¤¤ëÁÇÀ­¤«¤é unigramÁÇÀ­,  
º¸Ê¸Ì®ÁÇÀ­(¤½¤Î·ÁÂÖÁǤòº¸Â¦¤«¤é¸«¤¿»þ¤ÎÁÇÀ­),  
±¦Ê¸Ì®ÁÇÀ­(¤½¤Î·ÁÂÖÁǤòº¸Â¦¤«¤é¸«¤¿»þ¤ÎÁÇÀ­)¤Î3¤Ä¤¬»È¤ï¤ì¤Þ¤¹. 
rewrite.def ¤Ï, ¼­½ñ¤ÎÁÇÀ­¤«¤é¤½¤ì¤¾¤ì¤ÎÆâÉôÁÇÀ­¤Ø¤Î¥Þ¥Ã¥Ô¥ó¥°¤òÄêµÁ¤·¤Þ¤¹. 
<img src="feature.png">
</p>
<p>¶ñÂÎŪ¤Ë°Ê²¼¤Î¤è¤¦¤Ê¤³¤È¤¬¥Þ¥Ã¥Ô¥ó¥°´Ø¿ô¤òŬÀÚ¤ËÄêµÁ¤¹¤ë¤³¤È¤Ç¼Â¸½¤Ç¤­¤Þ¤¹. </p>
<ul>
<li>¡ÖÍè¤ë¡×¡Ö¤¯¤ë¡×¤È¤¤¤¦Æó¤Ä¤Îɽµ­¤ò¡ÖÍè¤ë¡×¤Ë¤Þ¤È¤á¤ÆÅý·×Ãͤò·×»»¤¹¤ë. 
<li>Ï¢ÀÜ¥³¥¹¥È¤Î·×»»¤ÎºÝ, ÉÊ»ì¤Î¤ß¤ò»È¤¦/¸ì×ò½¤¹¤ë....  Åù¡¹ÁÇÀ­¤Î¤É¤ÎÉô
    ʬ¤ò»È¤¦¤«¤òºÙ¤«¤¯ÄêµÁ¤¹¤ë. 
</ul>
</p>

<p>
rewrite.def ¤Ë¤Ï 3 ¤Ä¤Î¥»¥¯¥·¥ç¥ó¤¬¤¢¤ê¤Þ¤¹.
<ul>
 <li>[unigram rewrite]: Unigram ÆâÉô¾õÂ֤ؤΥޥåԥó¥°
 <li>[left rewrite]: º¸Ê¸Ì® bigram ¤Ø¤Î¥Þ¥Ã¥Ô¥ó¥°
 <li>[right rewrite]: ±¦Ê¸Ì® bigram ¤Ø¤Î¥Þ¥Ã¥Ô¥ó¥°
</ul>

<p>
¤½¤ì¤¾¤ì¤Î¥»¥¯¥·¥ç¥ó¤Î¸å¤Ë, 1¹Ô¤Ë1¤Ä¤Î¥Þ¥Ã¥Ô¥ó¥°¥ë¡¼¥ë¤¬Â³¤­¤Þ¤¹. 
¥Þ¥Ã¥Ô¥ó¥°¥ë¡¼¥ë¤Ï
<pre>
¥Þ¥Ã¥Á¥Ñ¥¿¡¼¥ó  ÊÑ´¹Àè
</pre>
¤È¤¤¤¦·Á¼°¤Çµ­½Ò¤·¤Þ¤¹. ¥Þ¥Ã¥Ô¥ó¥°¥ë¡¼¥ë¤ÏÀèƬ¤«¤é½ç¤ËÁöºº¤µ¤ì¤ÆºÇ½é¤Ë
¥Þ¥Ã¥Á¤·¤¿¤â¤Î¤¬»È¤ï¤ì¤Þ¤¹. 
<p>
¥Þ¥Ã¥Á¥Ñ¥¿¡¼¥ó¤Ç¤Ï´Êñ¤ÊÀµµ¬É½¸½¤¬¤ò»È¤¦¤³¤È¤¬¤Ç¤­¤Þ¤¹.
<ul>
  <li>*: ¤¹¤Ù¤Æ¤Îʸ»úÎó¤Ë¥Þ¥Ã¥Á
  <li>(AB|CD|EF): AB ¤â¤·¤¯¤Ï CD ¤â¤·¤¯¤Ï EF ¤Ë¥Þ¥Ã¥Á
  <li>AB: ʸ»úÎó AB ¤Î¤ß¤Ë´°Á´¥Þ¥Ã¥Á
</ul>
</p>
<p>
ÊÑ´¹Àè¤Ï $1 $2, $3.. ¤È¤¤¤¦¥Þ¥¯¥í¤ò»È¤¤ ÁÇÀ­¤Î³ÆÍ×ÁÇ (CSV¤Çµ­¤µ¤ì¤¿Í×ÁÇ)
¤ÎÆâÍƤò»²¾È¤¹¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹. 
</p>

<p>
Îã
<pre>
[unigram rewrite]
# Æɤß,ȯ²»¤ò¤È¤ê¤Î¤¾¤¤¤Æ, ÉÊ»ì1,2,3,4,³èÍÑ·Á,³èÍÑ·¿,¸¶·Á,¤è¤ß ¤ò»È¤¦
*,*,*,*,*,*,*,*  $1,$2,$3,$4,$5,$6,$7,$8
# Æɤߤ¬¤Ê¤¤¾ì¹ç¤Ï̵»ë
*,*,*,*,*,*,*    $1,$2,$3,$4,$5,$6,$7,*

[left rewrite]
(½õ»ì|½õÆ°»ì),*,*,*,*,*,(¤Ê¤¤|̵¤¤)    $1,$2,$3,$4,$5,$6,̵¤¤
(½õ»ì|½õÆ°»ì),½ª½õ»ì,*,*,*,*,(¤è|¥è)   $1,$2,$3,$4,$5,$6,¤è
...

[right rewrite]
(½õ»ì|½õÆ°»ì),*,*,*,*,*,(¤Ê¤¤|̵¤¤)    $1,$2,$3,$4,$5,$6,̵¤¤
(½õ»ì|½õÆ°»ì),½ª½õ»ì,*,*,*,*,(¤è|¥è)   $1,$2,$3,$4,$5,$6,¤è
..
</pre>
</p>

<h3>feature.def</h3>
<p>
ÆâÉô¾õÂÖ¤ÎÁÇÀ¸Î󤫤é <a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a>¤ÎÁÇÀ¸Îó¤òÃê½Ð¤¹¤ë¤¿¤á¤Î¥Æ¥ó¥×¥ì¡¼¥È¤òÄêµÁ¤·¤¿¥Õ¥¡¥¤¥ë¤Ç¤¹
</p>

<p>³Æ¹Ô¤¬°ì¥Æ¥ó¥×¥ì¡¼¥È¤ËÂбþ¤·¤Þ¤¹. UNIGRAM ¤Ç¤Ï¤¸¤Þ¤ë¤â¤Î¤Ï UNIGRAM ÍÑ
    ¤Î¥Æ¥ó¥×¥ì¡¼¥È, BIGRAM ¤Ç¤Ï¤¸¤Þ¤ë¤â¤Î¤ÏÏ¢ÀÜÍѤΥƥó¥×¥ì¡¼¥È¤Ç¤¹. </p>

<p>
³Æ¥Æ¥ó¥×¥ì¡¼¥È¤Ç¤Ï, °Ê²¼¤Î¥Þ¥¯¥í¤ò»È¤¦¤³¤È¤¬¤Ç¤­¤Þ¤¹
<ul>
<li>%F[n]: ¥æ¥Ë¥°¥é¥à¤Î nÈÖÌܤÎÁÇÀ­¤ËŸ³«¤µ¤ì¤Þ¤¹. 
<li>%F?[n] :¥æ¥Ë¥°¥é¥à¤Î nÈÖÌܤÎÁÇÀ­¤ËŸ³«¤µ¤ì¤Þ¤¹. ¤¿¤À¤·, ̤ÄêµÁ¤Î¾ì¹ç¤½¤Î¥Æ¥ó¥×¥ì¡¼
    ¥È¤½¤Î¤â¤Î¤Ï»È¤ï¤ì¤Þ¤»¤ó. 
<li>%t: ʸ»ú¼ï¾ðÊó¤ËŸ³«¤µ¤ì¤Þ¤¹. ʸ»ú¼ï¤Ï char.def ¤ÇÄêµÁ¤µ¤ì¤¿
    ¤â¤Î¤¬»È¤ï¤ì¤Þ¤¹. (%t ¤Ï ¥æ¥Ë¥°¥é¥àÁÇÀ­¤Î»þ¤Î¤ßÍ­¸ú¤Ç¤¹)
<li>%L[n]: º¸Ê¸Ì®¤Î nÈÖÌܤÎÁÇÀ­¤ËŸ³«¤µ¤ì¤Þ¤¹. 
<li>%L?[n]: º¸Ê¸Ì®¤Î nÈÖÌܤÎÁÇÀ­¤ËŸ³«¤µ¤ì¤Þ¤¹. ¤¿¤À¤·, ̤ÄêµÁ¤Î¾ì¹ç¤½¤Î¥Æ¥ó¥×¥ì¡¼
    ¥È¤½¤Î¤â¤Î¤Ï»È¤ï¤ì¤Þ¤»¤ó. 
<li>%R[n]: ±¦Ê¸Ì®¤Î nÈÖÌܤÎÁÇÀ­¤ËŸ³«¤µ¤ì¤Þ¤¹. 
<li>%R?[n]: º¸Ê¸Ì®¤Î nÈÖÌܤÎÁÇÀ­¤ËŸ³«¤µ¤ì¤Þ¤¹. ¤¿¤À¤·, ̤ÄêµÁ¤Î¾ì¹ç¤½¤Î¥Æ¥ó¥×¥ì¡¼
    ¥È¤½¤Î¤â¤Î¤Ï»È¤ï¤ì¤Þ¤»¤ó. 
</ul>
</p>

<p>
Îã
<pre>
UNIGRAM W0:%F[6]
UNIGRAM W1:%F[0]/%F[6]
UNIGRAM W2:%F[0],%F?[1]/%F[6]
UNIGRAM W3:%F[0],%F[1],%F?[2]/%F[6]
UNIGRAM W4:%F[0],%F[1],%F[2],%F?[3]/%F[6]

UNIGRAM T0:%t
UNIGRAM T1:%F[0]/%t
UNIGRAM T2:%F[0],%F?[1]/%t
UNIGRAM T3:%F[0],%F[1],%F?[2]/%t
UNIGRAM T4:%F[0],%F[1],%F[2],%F?[3]/%t

BIGRAM B00:%L[0]/%R[0]
BIGRAM B01:%L[0],%L?[1]/%R[0]
BIGRAM B02:%L[0]/%R[0],%R?[1]
BIGRAM B03:%L[0]/%R[0],%R[1],%R?[2]
BIGRAM B04:%L[0],%L?[1]/%R[0],%R[1],%R?[2]
BIGRAM B05:%L[0]/%R[0],%R[1],%R[2],%R?[3]
BIGRAM B06:%L[0],%L?[1]/%R[0],%R[1],%R[2],%R?[3]
... 
</pre>

<h2><a name="corpus">³Ø½¬ÍÑ¥³¡¼¥Ñ¥¹¤Î½àÈ÷</a></h2>
<p>³Ø½¬¥Ç¡¼¥¿¤Ï, MeCab ¤Î¥Ç¥Õ¥©¥ë¥È½ÐÎϤÈƱ°ì¥Õ¥©¡¼¥Þ¥Ã¥È¤Çµ­½Ò¤·¤Þ¤¹. 
</p>

<pre>
ÂÀϺ    ̾»ì,¸Çͭ̾»ì,¿Í̾,̾,*,*,ÂÀϺ,¥¿¥í¥¦,¥¿¥í¡¼
¤Ï      ½õ»ì,·¸½õ»ì,*,*,*,*,¤Ï,¥Ï,¥ï
²Ö»Ò    ̾»ì,¸Çͭ̾»ì,¿Í̾,̾,*,*,²Ö»Ò,¥Ï¥Ê¥³,¥Ï¥Ê¥³
¤¬      ½õ»ì,³Ê½õ»ì,°ìÈÌ,*,*,*, ¤¬,¥¬,¥¬
¹¥¤­    ̾»ì,·ÁÍÆÆ°»ì¸ì´´,*,*,*,*, ¹¥¤­,¥¹¥­,¥¹¥­
¤À      ½õÆ°»ì,*,*,*, Æü졦¥À,´ðËÜ·Á,¤À,¥À,¥À
.       µ­¹æ,¶çÅÀ,*,*,*,*, . , . , . 
EOS
¾ÆÃñ    ̾»ì,°ìÈÌ,*,*,*,*,¾ÆÃñ,¥·¥ç¥¦¥Á¥å¥¦,¥·¥ç¡¼¥Á¥å¡¼
¹¥¤­    ̾»ì,·ÁÍÆÆ°»ì¸ì´´,*,*,*,*,¹¥¤­,¥¹¥­,¥¹¥­
¤Î      ½õ»ì,Ï¢Âβ½,*,*,*,*, ¤Î,¥Î,¥Î
¿ÆÉã    ̾»ì,°ìÈÌ,*,*,*,*,¿ÆÉã,¥ª¥ä¥¸,¥ª¥ä¥¸
.       µ­¹æ,¶çÅÀ,*,*,*,*, . , . , . 
EOS
... 
</pre>

<p>
¥¿¥Ö¤Ç¶èÀÚ¤é¤ì¤¿ºÇ½é¤ÎÉôʬ¤¬É½ÁØʸ»ú¤Ç¤¹. ¼¡¤ËÁÇÀ­ÇÛÎó¤ò CSV¤Çɽ¸½¤·¤¿Ê¸
»úÎó¤¬Â³¤­¤Þ¤¹. ʸ¤Î¶èÀÚ¤ê¤Ë¤Ï EOS ¤Î¤ß¤Î¹Ô¤òÃÖ¤­¤Þ¤¹.</p>

<h2><a name="binary">³Ø½¬ÍѥХ¤¥Ê¥ê¼­½ñ¤ÎºîÀ®</a></h2>

<p>¸½ºß¤Îºî¶È¥Ç¥£¥ì¥¯¥È¥ê¤ò WORK ¤È¤·¤Þ¤¹. WORK °Ê²¼¤Ë seed ¤È final ¤È
¤¤¤¦Æó¤Ä¤Î¥Ç¥£¥ì¥¯¥È¥ê¤òºî¤Ã¤Æ¤¯¤À¤µ¤¤. </p>

<pre>
cd $WORK
mkdir seed final
</pre>

<p>seed ¥Ç¥£¥ì¥¯¥È¥ê¤Ë¤µ¤­¤Û¤ÉÀâÌÀ¤·¤¿°Ê²¼¤Î¥Õ¥¡¥¤¥ë¤ò¥³¥Ô¡¼¤·¤Þ¤¹. 
<ul>
 <li>seed ¼­½ñ  (CSV ¤Î¥Õ¥¡¥¤¥ë½¸¹ç)
 <li>Á´ÀßÄê¥Õ¥¡¥¤¥ë (char.def,  unk.def,  rewrite.def,  feature.def)
 <li>³Ø½¬Íѥǡ¼¥¿ (¥Õ¥¡¥¤¥ë̾: corpus)
</ul>

<p>
Îã
<pre>
% cd $WORK/seed
% ls 
Adj.csv          Interjection.csv   Noun.name.csv    Noun.verbal.csv  Symbol.csv        rewrite.def
Adnominal.csv    Noun.adjv.csv      Noun.number.csv  Others.csv       Verb.csv          unk.def
Adverb.csv       Noun.adverbal.csv  Noun.org.csv     Postp-col.csv    char.def
Auxil.csv        Noun.csv           Noun.others.csv  Postp.csv        corpus
Conjunction.csv  Noun.demonst.csv   Noun.place.csv   Prefix.csv       dicrc
Filler.csv       Noun.nai.csv       Noun.proper.csv  Suffix.csv       feature.def
</pre>
</p>

<p>°Ê²¼¤Î¥³¥Þ¥ó¥É¤ò¼Â¹Ô¤·¤Æ, ³Ø½¬ÍѥХ¤¥Ê¥ê¼­½ñ¤òºîÀ®¤·¤Þ¤¹. 
<pre>
% cd $WORK/seed
% /usr/local/libexec/mecab/mecab-dict-index

°Ê²¼¤Î¤è¤¦¤Ë -d,  -o ¤ò»È¤¦¤³¤È¤â¤Ç¤­¤Þ¤¹. 
% /usr/local/libexec/mecab/mecab-dict-index -d $WORK/seed -o $WORK/seed
</pre>

<ul>
<li>-d: seed ¼­½ñ, ÀßÄê¥Õ¥¡¥¤¥ë¤¬¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È)
<li>-o: ³Ø½¬ÍѥХ¤¥Ê¥ê¼­½ñ¤¬½ÐÎϤµ¤ì¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È)
</ul>
</p>

<h2><a name="crf"><a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a> ¥Ñ¥é¥á¡¼¥¿¤Î³Ø½¬</a></h2>
<p>
<pre>
% cd $WORK/seed
% /usr/local/libexec/mecab/mecab-cost-train -c 1.0 corpus model

°Ê²¼¤Î¤è¤¦¤Ë -d ¤ò»È¤Ã¤Æ¼­½ñ¤ò»ØÄꤹ¤ë¤³¤È¤â¤Ç¤­¤Þ¤¹<
% /usr/local/libexec/mecab/mecab-cost-train -d $WORK/seed -c 1.0 $WORK/seed/corpus $WORK/seed/model
</pre>
</p>

<ul>
<li>-d: ³Ø½¬ÍѥХ¤¥Ê¥ê¼­½ñ¤¬¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È)
<li>-c: <a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a>¤Î¥Ï¥¤¥Ñ¡¼¥Ñ¥é¥á¡¼¥¿
<li>-f: ÁÇÀ­ÉÑÅÙ¤ÎïçÃÍ
<li>-p NUM: NUM ÊÂÎó¤Ç³Ø½¬¤ò¼Â¹Ô (¥Ç¥Õ¥©¥ë¥È¤Ï1)
<li>corpus: ³Ø½¬¥Ç¡¼¥¿¤Î¥Õ¥¡¥¤¥ë̾
<li>model: ½ÐÎϤµ¤ì¤ë<a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a>¥Ñ¥é¥á¡¼¥¿¤Î¥Õ¥¡¥¤¥ë̾
</ul>

<p>
mecab-cost-train ¤Ï¥Ð¥¤¥Ê¥ê¥â¥Ç¥ë¤ÎºîÀ®¤Î»þ¤ËÂçÎ̤Υá¥â¥ê¤ò¾ÃÈñ¤·¤Þ¤¹. 
°Ê²¼¤Î¤è¤¦¤Ë¥Ð¥¤¥Ê¥ê¥â¥Ç¥ë¤ÎºîÀ®¤òÊÌ¥×¥í¥»¥¹¤Ç¹Ô¤¦¤³¤È¤Ç¥á¥â¥ê¾ÃÈñ¤ò
ÍÞ¤¨¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹. 
<pre>
% /usr/local/libexec/mecab/mecab-cost-train -y -c 1.0 corpus model
% /usr/local/libexec/mecab/mecab-cost-train -b model.txt model
</pre>
</p>

<p>
¥Ï¥¤¥Ñ¡¼¥Ñ¥é¥á¡¼¥¿C¤Ï, ³Ø½¬¤Î¡Ö¶¯¤µ¡×¤ò·è¤á¤Þ¤¹. 
C ¤òÂ礭¤¯¤¹¤ë¤È, ³Ø½¬¥Ç¡¼¥¿¤Ë¤Ç¤­¤ë¤À¤±¥Õ¥£¥Ã¥È¤·¤è¤¦¤È¤·¤Þ¤¹¤¬, 
²á³Ø½¬¤¹¤ë²ÄǽÀ­¤¬¤¢¤ê¤Þ¤¹.  ¾®¤µ¤¯¤¹¤ë¤È, ²á³Ø½¬¤òÈò¤±¤è¤¦¤È¤·¤Þ¤¹¤¬, ½½Ê¬¤Ê³Ø½¬¤¬¤Ç¤­¤Ê¤¤²ÄǽÀ­¤¬¤¢¤ê¤Þ¤¹. 
ŬÀÚ¤Ê C ¤Ï, ¸òº¹¸¡ÄêÅù¤Î¥â¥Ç¥ëÁªÂò¼êË¡¤Çȯ¸«Åª¤Ë¸«¤Ä¤±¤ë¤·¤«¤¢¤ê¤Þ¤»¤ó. 
¥Ç¥Õ¥©¥ë¥È¤ÎÃÍ¤Ï 1. 0 ¤È¤Ê¤Ã¤Æ¤¤¤Þ¤¹. </p>

<p>
-f ¥ª¥×¥·¥ç¥ó¤Ë¤è¤Ã¤ÆÁÇÀ­ÉÑÅÙ¤ÎïçÃͤò»ØÄꤹ¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹. Î㤨¤Ð, -f
3 ¤È¤¹¤ë¤È, ³Ø½¬¥Ç¡¼¥¿Ãæ¤Ë3²ó°Ê¾å½Ð¸½¤·¤¿ÁÇÀ­¤Î¤ß¤ò»È¤¤¤Þ¤¹. ŬÀÚ¤Ê
ÁÇÀ­ïçÃͤÏ, ¸òº¹¸¡ÄêÅù¤Î¥â¥Ç¥ëÁªÂò¼êË¡¤Çȯ¸«Åª¤Ë¸«¤Ä¤±¤ë¤·¤«¤¢¤ê¤Þ¤»¤ó. 
</p>

<p>
³Ø½¬Ãæ, °Ê²¼¤Î¤è¤¦¤Ê¾ðÊ󤬽ÐÎϤµ¤ì¤Þ¤¹. 
<pre>
reading corpus ... adding virtual node: ̾»ì,¸Çͭ̾»ì,ÃÏ°è,°ìÈÌ,*,*,ÅìÆü,¥È¥¦¥Ë¥Á,¥È¥¦¥Ë¥Á
adding virtual node: Éû»ì,½õ»ìÎàÀܳ,*,*,*,*,¤«¤Ê¤ê,¥«¥Ê¥ê,¥«¥Ê¥ê

Number of sentences: 32
Number of features:  47547
eta:                 0.00010
freq:                1
C(sigma^2):          1.00000

iter=0 err=1.00000 F=0.41186 target=1691.68869 diff=1.00000
iter=1 err=1.00000 F=0.68727 target=1077.14848 diff=0.36327
iter=2 err=0.87500 F=0.81904 target=621.20311 diff=0.42329
iter=3 err=0.81250 F=0.86354 target=384.72432 diff=0.38068
iter=4 err=0.68750 F=0.93685 target=233.72722 diff=0.39248
..
</pre>
<ul>
<li>adding virtual node: ̤Ãθì½èÍý¤ò¹Ô¤Ê¤Ã¤Æ¤â½èÍý¤Ç¤­¤Ê¤«¤Ã¤¿
    ·ÁÂÖÁǤÇ, ³Ø½¬¤ÎºÝÊص¹Åª¤ËÄɲ䵤ì¤ë·ÁÂÖÁǤǤ¹. 
<li>iter: ³Ø½¬²ó¿ô
<li>err: ʸ¥ì¥Ù¥ë¤Î¥¨¥é¡¼Î¨
<li>F: FÃÍ(ÀºÅ٤ȺƸ½Î¨¤ÎÄ´ÏÂÊ¿¶Ñ)
<li>target: ÌÜŪ´Ø¿ô¤ÎÃÍ. ¤³¤ÎÃͤ¬¼ý«¤¹¤ë¤È³Ø½¬¤¬½ªÎ»¤·¤Þ¤¹. 
<li>diff: ÌÜŪ´Ø¿ô¤ÎÁêÂÐŪ¤Êº¹Ê¬. ¤³¤ÎÃͤ¬ 0. 0001 ¤Ë¤Ê¤ë¤È³Ø½¬¤¬½ªÎ»¤·¤Þ
    ¤¹. 
</ul>
</p>

<p>
<h2><a name="dist">ÇÛÉÛÍѼ­½ñ¤ÎºîÀ®</a></h2>
<pre>
% cd $WORK/seed
% /usr/local/libexec/mecab/mecab-dict-gen -o ../final -m model

°Ê²¼¤Î¤è¤¦¤Ë -d,  -o ¤ò»È¤Ã¤Æ¼­½ñ¤ò»ØÄꤹ¤ë¤³¤È¤â¤Ç¤­¤Þ¤¹
% /usr/local/libexec/mecab/mecab-dict-gen -o $WORK/final -d $WORK/seed -m $WORK/seed/model
</pre>

<ul>
<li>-d: seed ¼­½ñ¤¬¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È)
<li>-o: ÇÛÉÛÍѼ­½ñ¤Î½ÐÎÏÀè¥Ç¥£¥ì¥¯¥È¥ê 
<li>-m: <a href="http://www.cis.upenn.edu/~pereira/papers/crf.pdf">CRF</a> ¤Î¥Ñ¥é¥á¡¼¥¿¥Õ¥¡¥¤¥ë
</ul>

<p>ÇÛÉÛÍѼ­½ñ¤Ï, seed ¼­½ñ¤ÈÊ̤Υǥ£¥ì¥¯¥È¥ê¤Ë½ÐÎϤ·¤Ê¤±¤ì¤Ð¤Ê¤ê¤Þ¤»¤ó. 
Ä̾ï, ÇÛÉÛ¼­½ñ¥Ç¥£¥ì¥¯¥È¥ê final ¤ò¥¢¡¼¥«¥¤¥Ö¤·¤Æ¥æ¡¼¥¶¤ËÇÛÉÛ¤·¤Þ¤¹. </p>

<h2><a name="test">²òÀÏÍѥХ¤¥Ê¥ê¼­½ñ¤ÎºîÀ®</a></h2>

<pre>
% cd $WORK/final
% /usr/local/libexec/mecab/mecab-dict-index 

°Ê²¼¤Î¤è¤¦¤Ë -d,  -o ¤ò»È¤¦¤³¤È¤â¤Ç¤­¤Þ¤¹. 
% /usr/local/libexec/mecab/mecab-dict-index -d $WORK/final -o $WORK/final
</pre>

<p>
<ul>
<li>-d: seed ¼­½ñ, ÀßÄê¥Õ¥¡¥¤¥ë¤¬¤¢¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È)
<li>-o: ³Ø½¬ÍѥХ¤¥Ê¥ê¼­½ñ¤¬½ÐÎϤµ¤ì¤ë¥Ç¥£¥ì¥¯¥È¥ê (¥Ç¥Õ¥©¥ë¥È¤Ï¥«¥ì¥ó¥È)
</ul>

<p>º£ºî¤Ã¤¿¼­½ñ¤ò»È¤Ã¤Æ¼ÂºÝ¤Ë²òÀϤ·¤Æ¤ß¤Þ¤¹. </p>
<pre>
% mecab -d $WORK/final
¾ÆÃñ¹¥¤­¤Î¿ÆÉã. 
¾ÆÃñ    ̾»ì,°ìÈÌ,*,*,*,*,¾ÆÃñ,¥·¥ç¥¦¥Á¥å¥¦,¥·¥ç¡¼¥Á¥å¡¼
¹¥¤­    ̾»ì,·ÁÍÆÆ°»ì¸ì´´,*,*,*,*,¹¥¤­, ¥¹¥­, ¥¹¥­
¤Î      ½õ»ì,Ï¢Âβ½,*,*,*,*,¤Î,¥Î,¥Î
¿ÆÉã    ̾»ì,°ìÈÌ,*,*,*,*,¿ÆÉã, ¥ª¥ä¥¸, ¥ª¥ä¥¸
.       µ­¹æ,¶çÅÀ,*,*,*,*,.,.,. 
EOS
</pre>
</p>

<h2><a name="eval">ɾ²Á</a></h2>
<p>
¥Æ¥¹¥È¥Ç¡¼¥¿¤òÍÑ°Õ¤·¤Þ¤¹. ¥Æ¥¹¥È¥Ç¡¼¥¿¤Ï MeCab ¤Î
¥Ç¥Õ¥©¥ë¥È½ÐÎϤÈƱ°ì¥Õ¥©¡¼¥Þ¥Ã¥È¤Çµ­½Ò¤·¤Þ¤¹. 
</p>

<p>
¤Þ¤º, mecab-test-gen ¤ò»È¤Ã¤Æ¥Æ¥¹¥È¥³¡¼¥Ñ¥¹(test)¤«¤é, ʸ¤Î¤ß(test.sen)¤òÃê½Ð¤·¤Þ¤¹. 
<pre>
% /usr/local/libexec/mecab/mecab-test-gen < test > test.sen
</pre>
</p>

<p>test.sen ¤ò¤µ¤­¤Û¤Éºî¤Ã¤¿¼­½ñ¤Ç²òÀϤ·¤Þ¤¹. </p>
<pre>
% mecab -d $WORK/final test.sen > test.result
</pre>
</p>

<p>
ɾ²Á¥¹¥¯¥ê¥×¥È mecab-system-eval ¤ò¼Â¹Ô¤·¤Þ¤¹. 
Âè°ì°ú¿ô¤¬¥·¥¹¥Æ¥à¤Î·ë²Ì, ÂèÆó°ú¿ô¤¬Àµ²ò¤Î¥Õ¥¡¥¤¥ë¤Ç¤¹. 
<pre>
% /usr/local/libexec/mecab/mecab-system-eval test.result test
                    precision          recall              F
LEVEL 0:    98.6887(647112/655710) 98.9793(647112/653785) 98.8338
LEVEL 1:    98.2163(644014/655710) 98.5055(644014/653785) 98.3607
LEVEL 2:    97.2230(637501/655710) 97.5093(637501/653785) 97.3659
LEVEL 4:    96.8367(634968/655710) 97.1218(634968/653785) 96.9791
</pre>
</p>

<p>-l ¥ª¥×¥·¥ç¥ó¤Ë¤è¤Ã¤Æ, ¤É¤ÎÁÇÀ­¤Î¥ì¥Ù¥ë¤ò»È¤Ã¤Æɾ²Á¤¹¤ë¤«»ØÄê¤Ç¤­¤Þ¤¹. 
<ul>
  <li>-l 0: 0 ÈÖÌܤÎÁÇÀ­¤Î¤ß¤ò»È¤Ã¤Æɾ²Á¤·¤Þ¤¹. 
  <li>-l 4: 0¡Á4 ÈÖÌܤÎÁÇÀ­¤ò»È¤Ã¤Æɾ²Á¤·¤Þ¤¹
  <li>-l -1: Á´¥ì¥Ù¥ë¤ÎÁÇÀ­¤ò»È¤Ã¤Æɾ²Á¤·¤Þ¤¹
  <li>-l "0 1 2" 0ÈÖÌÜ,  0¡Á1ÈÖÌÜ,  0¡Á4ÈÖÌܤÎ3¤Ä¤Îɾ²Á¤òɽ¼¨¤·¤Þ¤¹. 
  <li>-l "0 1 -1" 0ÈÖÌÜ,  0¡Á1ÈÖÌÜ,  Á´¥ì¥Ù¥ë¤Î3¤Ä¤Îɾ²Á¤òɽ¼¨¤·¤Þ¤¹. 
</ul>

<hr>
<p>$Id: learn.html 131 2007-06-09 16:18:15Z taku-ku $;</p>
</body>
</html>