Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > 6a7802c6c3139de476e21dfe96b215ad > files > 27

brilltagger-1.14-10mdv2010.0.i586.rpm

===============================================================================
Copyright (c) 1993 Massachusetts Institute of Technology and
University of Pennsylvania.  All rights reserved.
This program was written by Eric Brill (brill@goldilocks.lcs.mit.edu)
(After July 1 1994, my email address will be brill@blaze.cs.jhu.edu)
Feel free to contact me with any questions, comments, . . .
===============================================================================
THIS SOFTWARE IS PROVIDED "AS IS", AND M.I.T. MAKES NO REPRESENTATIONS 
OR WARRANTIES, EXPRESS OR IMPLIED.  By way of example, but not 
limitation, M.I.T. MAKES NO REPRESENTATIONS OR WARRANTIES OF 
MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF 
THE LICENSED SOFTWARE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY 
PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
=============================================================================
Please Read The COPYRIGHT file included with the tagger.
=============================================================================

Code for training and tagging in n-best mode is provided with this
release.  This code is still under development, and is provided in
prerelease form, in case anybody may have use for it.  Hopefully, in
the next release, we will clean up this code, better document it, etc.
So, feel free to use it, but please keep in mind its "under
development" status.

N-best tagging takes place after 1-best tagging.  So, first, text is
passed through the start state tagger, which tags every word with its
most likely tag (using rules to determine the most likely tag for
unkown words).  Then, the text is passed through the final state
annotator, which changes some tags based on contextual cues.  After
this, the text can be passed through the nbest tagger, which will add
tags to words in certain cases of uncertainty.

Training the n-best tagger:

cd Bin_and_Data

kbest-contextual-rule-learn CORRECTLY-TAGGED-CORPUS GUESS-CORPUS \
	FILE-FOR-RULES LEXICON

where CORRECTLY-TAGGED-CORPUS is the manually tagged gold standard,
GUESS-CORPUS is the output of the 1-best tagger when run on the same
text, FILE-FOR-RULES is the name of the file the rules will be written
into, and LEXICON is the lexicon.  Each entry of the lexicon looks
like:

word tag1 tag2  . . . tagn

where tag1 is the most likely tag for word in the training corpus, and
tag2 through tagn are other tags seen with word in the training
corpus, in no particular order.

Using the nbest tagger:

cat 1-BEST-TAGGED-FILE | nbest-tagger RULEFILE LEXICON 

where 1-BEST-TAGGED-FILE is the output of the 1-best tagger, RULEFILE
is the list of n-best rules learned using kbest-contextual-rule-learn,
and LEXICON is the lexicon.