Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > 5e1854624d3bc613bdd0dd13d1ef9ac7 > files > 359

gap-system-4.4.12-5mdv2010.0.i586.rpm

  
  6 String and Text Utilities
  
  
  6.1 Text Utilities
  
  This section describes some utility functions for handling texts within GAP.
  They  are  used by the functions in the GAPDoc package but may be useful for
  other  purposes  as  well.  We  start  with some variables containing useful
  strings and go on with functions for parsing and reformatting text.
  
  6.1-1 WHITESPACE
  
  > WHITESPACE_________________________________________________global variable
  > CAPITALLETTERS_____________________________________________global variable
  > SMALLLETTERS_______________________________________________global variable
  > LETTERS____________________________________________________global variable
  > DIGITS_____________________________________________________global variable
  > HEXDIGITS__________________________________________________global variable
  
  These  variables  contain  sets  of  characters  which  are  useful for text
  processing. They are defined as follows.
  
  WHITESPACE
        " \n\t\r"
  
  CAPITALLETTERS
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
  
  SMALLLETTERS
        "abcdefghijklmnopqrstuvwxyz"
  
  LETTERS
        concatenation of CAPITALLETTERS and SMALLLETTERS
  
  DIGITS
        "0123456789"
  
  HEXDIGITS
        "0123456789ABCDEFabcdef"
  
  6.1-2 TextAttr
  
  > TextAttr___________________________________________________global variable
  
  The  record  TextAttr  contains  strings  which can be printed to change the
  terminal  attribute  for  the  following  characters.  This  only works with
  terminals  which  understand  basic ANSI escape sequences. Try the following
  example  to see if this is the case for the terminal you are using. It shows
  the  effect  of  the  foreground  and background color attributes and of the
  .bold, .blink, .normal, .reverse and.underscore which can partly be mixed.
  
  ---------------------------  Example  ----------------------------
    extra := ["CSI", "reset", "delline", "home"];;
    for t in Difference(RecNames(TextAttr), extra) do
      Print(TextAttr.(t), "TextAttr.", t, TextAttr.reset,"\n");
    od;
  ------------------------------------------------------------------
  
  The  suggested  defaults for colors 0..7 are black, red, green, brown, blue,
  magenta,   cyan,  white.  But  this  may  be  different  for  your  terminal
  configuration.
  
  The  escape  sequence  .delline  deletes the content of the current line and
  .home moves the cursor to the beginning of the current line.
  
  ---------------------------  Example  ----------------------------
    for i in [1..5] do 
      Print(TextAttr.home, TextAttr.delline, String(i,-6), "\c"); 
      Sleep(1); 
    od;
  ------------------------------------------------------------------
  
  Whenever you use this in some printing routines you should make it optional.
  Use these attributes only, when the variable ANSI_COLORS has the value true.
  
  6.1-3 WrapTextAttribute
  
  > WrapTextAttribute( str, attr ) ___________________________________function
  Returns:  a string with markup
  
  The  argument  str  must  be  a  text as GAP string, possibly with markup by
  escape  sequences  as  in  TextAttr  (6.1-2). This function returns a string
  which  is  wrapped by the escape sequences attr and TextAttr.reset. It takes
  care  of  markup in the given string by appending attr also after each given
  TextAttr.reset in str.
  
  ---------------------------  Example  ----------------------------
    gap> str := Concatenation("XXX",TextAttr.2, "BLUB", TextAttr.reset,"YYY");
    "XXX\033[32mBLUB\033[0mYYY"
    gap> str2 := WrapTextAttribute(str, TextAttr.1);
    "\033[31mXXX\033[32mBLUB\033[0m\033[31mYYY\033[0m"
    gap> str3 := WrapTextAttribute(str, TextAttr.underscore);
    "\033[4mXXX\033[32mBLUB\033[0m\033[4mYYY\033[0m"
    gap> # use Print(str); and so on to see how it looks like.
  ------------------------------------------------------------------
  
  6.1-4 FormatParagraph
  
  > FormatParagraph( str[, len][, flush][, attr][, widthfun], ]] ) ___function
  Returns:  the formatted paragraph as string
  
  This  function  formats  a  text given in the string str as a paragraph. The
  optional arguments have the following meaning:
  
  len
        the length of the lines of the resulting text (default is 78)
  
  flush
        can  be "left", "right", "center" or "both", telling that lines should
        be  flushed  left,  flushed  right,  centered or left-right justified,
        respectively (default is "both")
  
  attr
        is  a  list  of  two  strings;  the  first is prepended and the second
        appended  to  each  line  of  the  result (can for example be used for
        indenting, [" ", ""], or some markup, [TextAttr.bold, TextAttr.reset],
        default is ["", ""])
  
  widthfun
        must be a function which returns the display width of text in str. The
        default  is  Length assuming that each byte corresponds to a character
        of  width  one.  If  str  is  given  in  UTF-8  encoding  one  can use
        WidthUTF8String (6.2-3) here.
  
  This  function tries to handle markup with the escape sequences explained in
  TextAttr (6.1-2) correctly.
  
  ---------------------------  Example  ----------------------------
    gap> str := "One two three four five six seven eight nine ten eleven.";;
    gap> Print(FormatParagraph(str, 25, "left", ["/* ", " */"]));           
    /* One two three four five */
    /* six seven eight nine ten */
    /* eleven. */
  ------------------------------------------------------------------
  
  6.1-5 SubstitutionSublist
  
  > SubstitutionSublist( list, sublist, new[, flag] ) ________________function
  Returns:  the changed list
  
  This  function  looks for (non-overlapping) occurrences of a sublist sublist
  in  a  list  list (compare PositionSublist (Reference: PositionSublist)) and
  returns a list where these are substituted with the list new.
  
  The  optional  argument flag can either be "all" (this is the default if not
  given)  or "one". In the second case only the first occurrence of sublist is
  substituted.
  
  If  sublist  does  not occur in list then list itself is returned (and not a
  ShallowCopy(list)).
  
  ---------------------------  Example  ----------------------------
    gap> SubstitutionSublist("xababx", "ab", "a");
    "xaax"
  ------------------------------------------------------------------
  
  6.1-6 StripBeginEnd
  
  > StripBeginEnd( list, strip ) _____________________________________function
  Returns:  changed string
  
  Here list and strip must be lists. This function returns the sublist of list
  which does not contain the leading and trailing entries which are entries of
  strip. If the result is equal to list then list itself is returned.
  
  ---------------------------  Example  ----------------------------
    gap> StripBeginEnd(" ,a, b,c,   ", ", ");
    "a, b,c"
  ------------------------------------------------------------------
  
  6.1-7 StripEscapeSequences
  
  > StripEscapeSequences( str ) ______________________________________function
  Returns:  string without escape sequences
  
  This  function  returns  the string one gets from the string str by removing
  all  escape  sequences  which are explained in TextAttr (6.1-2). If str does
  not contain such a sequence then str itself is returned.
  
  6.1-8 RepeatedString
  
  > RepeatedString( c, len ) _________________________________________function
  
  Here  c  must  be  either  a character or a string and len is a non-negative
  number.  Then  RepeatedString  returns  a string of length len consisting of
  copies of c.
  
  ---------------------------  Example  ----------------------------
    gap> RepeatedString('=',51);
    "==================================================="
    gap> RepeatedString("*=",51);
    "*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*"
  ------------------------------------------------------------------
  
  6.1-9 NumberDigits
  
  > NumberDigits( str, base ) ________________________________________function
  Returns:  integer
  
  > DigitsNumber( n, base ) __________________________________________function
  Returns:  string
  
  The  argument  str  of  NumberDigits  must be a string consisting only of an
  optional leading '-' and characters in 0123456789abcdefABCDEF, describing an
  integer  in  base  base  with  2  <=  base  <= 16. This function returns the
  corresponding integer.
  
  The function DigitsNumber does the reverse.
  
  ---------------------------  Example  ----------------------------
    gap> NumberDigits("1A3F",16);
    6719
    gap> DigitsNumber(6719, 16);
    "1A3F"
  ------------------------------------------------------------------
  
  6.1-10 PositionMatchingDelimiter
  
  > PositionMatchingDelimiter( str, delim, pos ) _____________________function
  Returns:  position as integer or fail
  
  Here  str must be a string and delim a string with two different characters.
  This  function searches the smallest position r of the character delim[2] in
  str such that the number of occurrences of delim[2] in str between positions
  pos+1  and  r is by one greater than the corresponding number of occurrences
  of delim[1].
  
  If such an r exists, it is returned. Otherwise fail is returned.
  
  ---------------------------  Example  ----------------------------
    gap> PositionMatchingDelimiter("{}x{ab{c}d}", "{}", 0);
    fail
    gap> PositionMatchingDelimiter("{}x{ab{c}d}", "{}", 1);
    2
    gap> PositionMatchingDelimiter("{}x{ab{c}d}", "{}", 6);
    11
  ------------------------------------------------------------------
  
  6.1-11 WordsString
  
  > WordsString( str ) _______________________________________________function
  Returns:  list of strings containing the words
  
  This  returns  the  list  of  words  of a text stored in the string str. All
  non-letters are considered as word boundaries and are removed.
  
  ---------------------------  Example  ----------------------------
    gap> WordsString("one_two \n    three!?");
    [ "one", "two", "three" ]
  ------------------------------------------------------------------
  
  6.1-12 Base64String
  
  > Base64String( str ) ______________________________________________function
  > StringBase64( bstr ) _____________________________________________function
  Returns:  a string
  
  The  first  function  translates arbitrary binary data given as a GAP string
  into  a  base 64 encoded string. This encoded string contains only printable
  ASCII  characters  and  is  used  in  various  data transfer protocols (MIME
  encoded  emails, weak password encryption, ...). We use the specification in
  RFC 2045 (http://tools.ietf.org/html/rfc2045).
  
  The  second  function has the reverse functionality. Here we also accept the
  characters -_ instead of +/ as last two characters. Whitespace is ignored.
  
  ---------------------------  Example  ----------------------------
    gap> b := Base64String("This is a secret!");
    "VGhpcyBpcyBhIHNlY3JldCEA="
    gap> StringBase64(b);                       
    "This is a secret!"
  ------------------------------------------------------------------
  
  
  6.2 Unicode Strings
  
  The  GAPDoc  package provides some tools to deal with unicode characters and
  strings.  These  can  be  used  for  recoding  text  strings between various
  encodings.
  
  
  6.2-1 Unicode Strings and Characters
  
  > Unicode( list[, encoding] ) _____________________________________operation
  > UChar( num ) ____________________________________________________operation
  > IsUnicodeString_____________________________________________________filter
  > IsUnicodeCharacter__________________________________________________filter
  > IntListUnicodeString( ustr ) _____________________________________function
  
  Unicode characters are described by their codepoint, an integer in the range
  from 0 to 2^21-1. For details about unicode, see http://www.unicode.org.
  
  The  function  UChar  wraps  an  integer  num into a GAP object lying in the
  filter  IsUnicodeCharacter.  Use Int to get the codepoint back. The argument
  num  can  also be a GAP character which is then translated to an integer via
  INT_CHAR (Reference: INT_CHAR).
  
  Unicode  produces  a  GAP  object  in  the filter IsUnicodeString. This is a
  wrapped  list  of  integers  for  the  unicode characters in the string. The
  function  IntListUnicodeString  gives access to this list of integers. Basic
  list  functionality  is  available for IsUnicodeString elements. The entries
  are in IsUnicodeCharacter. The argument list for Unicode is either a list of
  integers or a GAP string. In the latter case an encoding can be specified as
  string, its default is "UTF-8".
  
  Currently       supported      encodings      can      be      found      in
  UNICODE_RECODE.NormalizedEncodings  (ASCII,  ISO-8859-X, UTF-8 and aliases).
  The encoding "XML" means an ASCII encoding in which non-ASCII characters are
  specified  by  XML character entities. The encoding "URL" is for URL-encoded
  (also  called  percent-encoded  strings,  as specified in RFC 3986 (see here
  (http://www.ietf.org/rfc/rfc3986.txt)).  The  listed  encodings  "LaTeX" and
  aliases  cannot  be  used with Unicode. See the operation Encode (6.2-2) for
  mapping a unicode string to a GAP string.
  
  ---------------------------  Example  ----------------------------
    gap> ustr := Unicode("a and \366", "latin1");
    Unicode("a and ö")
    gap> ustr = Unicode("a and &#246;", "XML");  
    true
    gap> IntListUnicodeString(ustr);
    [ 97, 32, 97, 110, 100, 32, 246 ]
    gap> ustr[7];
    'ö'
  ------------------------------------------------------------------
  
  6.2-2 Encode
  
  > Encode( ustr[, encoding] ) ______________________________________operation
  Returns:  a GAP string
  
  > SimplifiedUnicodeString( ustr[, encoding][, "single"] ) __________function
  Returns:  a unicode string
  
  > LowercaseUnicodeString( ustr ) ___________________________________function
  Returns:  a unicode string
  
  > UppercaseUnicodeString( ustr ) ___________________________________function
  Returns:  a unicode string
  
  > LaTeXUnicodeTable__________________________________________global variable
  > SimplifiedUnicodeTable_____________________________________global variable
  > LowercaseUnicodeTable______________________________________global variable
  
  The  operation  Encode translates a unicode string ustr into a GAP string in
  some specified encoding. The default encoding is "UTF-8".
  
  Supported  encodings  can  be  found  in UNICODE_RECODE.NormalizedEncodings.
  Except  for some cases mentioned below characters which are not available in
  the target encoding are substituted by '?' characters.
  
  If  the  encoding  is  "URL" (see Unicode (6.2-1)) then an optional argument
  encreserved  can  be  given,  it must be a list of reserved characters which
  should be percent encoded; the default is to encode only the % character.
  
  The  encoding  "LaTeX"  substitutes  non-ASCII  characters and LaTeX special
  characters  by  LaTeX  code as given in an ordered list LaTeXUnicodeTable of
  pairs  [codepoint,  string].  If  you  have a unicode character for which no
  substitution  is  contained  in  that  list,  you will get a warning and the
  translation  is  Unicode(nr).  In  this  case  find a substitution and add a
  corresponding  [codepoint,  string]  pair  to LaTeXUnicodeTable using AddSet
  (Reference:  AddSet).  Also,  please,  tell  the  GAPDoc  authors about your
  addition,  such  that we can extend the list LaTeXUnicodeTable. (Most of the
  initial  entries  were  generated  from lists in the TeX projects encTeX and
  ucs.) There are some variants of this encoding:
  
  "LaTeXleavemarkup"  does  the same translations for non-ASCII characters but
  leaves the LaTeX special characters (e.g., any LaTeX commands) as they are.
  
  "LaTeXUTF8"  does  not  give  a  warning  about  unicode  characters without
  explicit  translation,  instead  it  translates  the  character to its UTF-8
  encoding.  Make  sure  to  setup  your  LaTeX  document  such that all these
  characters are understood.
  
  "LaTeXUTF8leavemarkup" is a combination of the last two variants.
  
  Note  that the "LaTeX" encoding can only be used with Encode but not for the
  opposite  translation  with  Unicode  (6.2-1)  (which  would  need  far  too
  complicated heuristics).
  
  The   function  SimplifiedUnicodeString  can  be  used  to  substitute  many
  non-ASCII  characters  by  related  ASCII  characters or strings (e.g., by a
  corresponding  character  without accents). The argument ustr and the result
  are  unicode  strings,  if encoding is "ASCII" then all non-ASCII characters
  are  translated,  otherwise  only  the  non-latin1 characters. If the string
  "single"  in  an argument then only substitutions are considered which don't
  make  the result string longer. The translations are stored in a sorted list
  SimplifiedUnicodeTable.  Its  entries  are  of  the form [codepoint, trans1,
  trans2,  ...].  Here trans1 and so on is either an integer for the codepoint
  of  a  substitution  character or it is a list of codepoint integers. If you
  are missing characters in this list and know a sensible ASCII approximation,
  then  add  an  entry  (with  AddSet (Reference: AddSet)) and tell the GAPDoc
  authors  about it. (The initial content of SimplifiedUnicodeTable was mainly
  generated from the "transtab" tables by Markus Kuhn.)
  
  The  function  LowercaseUnicodeString  gets and returns a unicode string and
  translates  each uppercase character to its corresponding lowercase version.
  This  function  uses  a  list  LowercaseUnicodeTable  of  pairs of codepoint
  integers.  This  list  was generated using the file UnicodeData.txt from the
  unicode definition (field 14 in each row).
  
  The   function   UppercaseUnicodeString  does  the  similar  translation  to
  uppercase characters.
  
  ---------------------------  Example  ----------------------------
    gap> ustr := Unicode("a and &#246;", "XML");
    Unicode("a and ö")
    gap> SimplifiedUnicodeString(ustr, "ASCII");
    Unicode("a and oe")
    gap> SimplifiedUnicodeString(ustr, "ASCII", "single");
    Unicode("a and o")
    gap> ustr2 := UppercaseUnicodeString(ustr);;
    gap> Print(Encode(ustr2, GAPInfo.TermEncoding), "\n");
    A AND Ö
  ------------------------------------------------------------------
  
  
  6.2-3 Lengths of UTF-8 strings
  
  > WidthUTF8String( str ) ___________________________________________function
  > NrCharsUTF8String( str ) _________________________________________function
  Returns:  an integer
  
  Let  str  be  a  GAP  string  with  text  in UTF-8 encoding. There are three
  "lengths" of such a string which must be distinguished. The operation Length
  (Reference:  Length)  returns the number of bytes and so the memory occupied
  by  str.  The  function  NrCharsUTF8String  returns  the  number  of unicode
  characters in str, that is the length of Unicode(str).
  
  In  many  applications  the function WidthUTF8String is more interesting, it
  returns the number of columns needed by the string if printed to a terminal.
  This   takes  into  account  that  some  unicode  characters  are  combining
  characters  and that there are wide characters which need two columns (e.g.,
  for  Chinese  or Japanese). (To be precise: This implementation assumes that
  there are no control characters in str and uses the character width returned
  by the wcwidth function in the GNU C-library called with UTF-8 locale.)
  
  ---------------------------  Example  ----------------------------
    gap> # A, German umlaut u, B, zero width space, C, newline
    gap> str := Encode( Unicode( "A&#xFC;B&#x200B;C\n", "XML" ) );;
    gap> Print(str);
    AüB​C
    gap> # umlaut u needs two bytes and the zero width space three
    gap> Length(str);
    9
    gap> NrCharsUTF8String(str);
    6
    gap> # zero width space and newline don't contribute to width
    gap> WidthUTF8String(str);
    4
  ------------------------------------------------------------------
  
  
  6.3 Print Utilities
  
  The  following  printing  utilities  turned out to be useful for interactive
  work  with  texts  in GAP. But they are more general and so we document them
  here.
  
  6.3-1 PrintTo1
  
  > PrintTo1( filename, fun ) ________________________________________function
  > AppendTo1( filename, fun ) _______________________________________function
  
  The  argument  fun must be a function without arguments. Everything which is
  printed  by  a call fun() is printed into the file filename. As with PrintTo
  (Reference:  PrintTo)  and AppendTo (Reference: AppendTo) this overwrites or
  appends to, respectively, a previous content of filename.
  
  These functions can be particularly efficient when many small pieces of text
  shall  be  written  to  a file, because no multiple reopening of the file is
  necessary.
  
  ---------------------------  Example  ----------------------------
    gap> f := function() local i; 
    >   for i in [1..100000] do Print(i, "\n"); od; end;; 
    gap> PrintTo1("nonsense", f); # now check the local file `nonsense'
  ------------------------------------------------------------------
  
  6.3-2 StringPrint
  
  > StringPrint( obj1[, obj2[, ...]] ) _______________________________function
  > StringView( obj ) ________________________________________________function
  
  These  functions return a string containing the output of a Print or ViewObj
  call with the same arguments.
  
  This should be considered as a (temporary?) hack. It would be better to have
  String (Reference: String) methods for all GAP objects and to have a generic
  Print (Reference: Print)-function which just interprets these strings.
  
  6.3-3 PrintFormattedString
  
  > PrintFormattedString( str ) ______________________________________function
  
  This  function prints a string str. The difference to Print(str); is that no
  additional  line breaks are introduced by GAP's standard printing mechanism.
  This  can  be  used  to print lines which are longer than the current screen
  width. In particular one can print text which contains escape sequences like
  those  explained  in  TextAttr (6.1-2), where lines may have more characters
  than visible characters.
  
  6.3-4 Page
  
  > Page( ... ) ______________________________________________________function
  > PageDisplay( obj ) _______________________________________________function
  
  These  functions  are  similar  to  Print  (Reference:  Print)  and  Display
  (Reference: Display), respectively. The difference is that the output is not
  sent  directly to the screen, but is piped into the current pager; see PAGER
  (Reference: Pager).
  
  ---------------------------  Example  ----------------------------
    gap> Page([1..1421]+0);
    gap> PageDisplay(CharacterTable("Symmetric", 14));
  ------------------------------------------------------------------
  
  6.3-5 StringFile
  
  > StringFile( filename ) ___________________________________________function
  > FileString( filename, str[, append] ) ____________________________function
  
  The  function  StringFile  returns the content of file filename as a string.
  This  works  efficiently with arbitrary (binary or text) files. If something
  went wrong, this function returns fail.
  
  Conversely  the  function FileString writes the content of a string str into
  the file filename. If the optional third argument append is given and equals
  true  then  the  content  of str is appended to the file. Otherwise previous
  content  of  the  file is deleted. This function returns the number of bytes
  written or fail if something went wrong.
  
  Both functions are quite efficient, even with large files.