Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > a866202fe868538f89a755dbcabc378b > files > 325

postgresql8.2-docs-8.2.14-1mdv2010.0.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML
><HEAD
><TITLE
>Character Set Support</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
REV="MADE"
HREF="mailto:pgsql-docs@postgresql.org"><LINK
REL="HOME"
TITLE="PostgreSQL 8.2.14 Documentation"
HREF="index.html"><LINK
REL="UP"
TITLE="Localization"
HREF="charset.html"><LINK
REL="PREVIOUS"
TITLE="Locale Support"
HREF="locale.html"><LINK
REL="NEXT"
TITLE="Routine Database Maintenance Tasks"
HREF="maintenance.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="stylesheet.css"><META
HTTP-EQUIV="Content-Type"
CONTENT="text/html; charset=ISO-8859-1"><META
NAME="creation"
CONTENT="2009-09-04T05:25:47"></HEAD
><BODY
CLASS="SECT1"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="5"
ALIGN="center"
VALIGN="bottom"
>PostgreSQL 8.2.14 Documentation</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="top"
><A
HREF="locale.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="top"
><A
HREF="charset.html"
>Fast Backward</A
></TD
><TD
WIDTH="60%"
ALIGN="center"
VALIGN="bottom"
>Chapter 21. Localization</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="top"
><A
HREF="charset.html"
>Fast Forward</A
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="top"
><A
HREF="maintenance.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="MULTIBYTE"
>21.2. Character Set Support</A
></H1
><A
NAME="AEN23715"
></A
><P
>   The character set support in <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
>
   allows you to store text in a variety of character sets, including
   single-byte character sets such as the ISO 8859 series and
   multiple-byte character sets such as <ACRONYM
CLASS="ACRONYM"
>EUC</ACRONYM
> (Extended Unix
   Code), UTF-8, and Mule internal code.  All supported character sets
   can be used transparently by clients, but a few are not supported
   for use within the server (that is, as a server-side encoding).
   The default character set is selected while
   initializing your <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> database
   cluster using <TT
CLASS="COMMAND"
>initdb</TT
>.  It can be overridden when you
   create a database, so you can have multiple
   databases each with a different character set.
  </P
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="MULTIBYTE-CHARSET-SUPPORTED"
>21.2.1. Supported Character Sets</A
></H2
><P
>     <A
HREF="multibyte.html#CHARSET-TABLE"
>Table 21-1</A
> shows the character sets available
     for use in <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
>.
    </P
><DIV
CLASS="TABLE"
><A
NAME="CHARSET-TABLE"
></A
><P
><B
>Table 21-1. <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> Character Sets</B
></P
><TABLE
BORDER="1"
CLASS="CALSTABLE"
><COL><COL><COL><COL><COL><COL><THEAD
><TR
><TH
>Name</TH
><TH
>Description</TH
><TH
>Language</TH
><TH
>Server?</TH
><TH
>Bytes/Char</TH
><TH
>Aliases</TH
></TR
></THEAD
><TBODY
><TR
><TD
><TT
CLASS="LITERAL"
>BIG5</TT
></TD
><TD
>Big Five</TD
><TD
>Traditional Chinese</TD
><TD
>No</TD
><TD
>1-2</TD
><TD
><TT
CLASS="LITERAL"
>WIN950</TT
>, <TT
CLASS="LITERAL"
>Windows950</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>EUC_CN</TT
></TD
><TD
>Extended UNIX Code-CN</TD
><TD
>Simplified Chinese</TD
><TD
>Yes</TD
><TD
>1-3</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>EUC_JP</TT
></TD
><TD
>Extended UNIX Code-JP</TD
><TD
>Japanese</TD
><TD
>Yes</TD
><TD
>1-3</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>EUC_KR</TT
></TD
><TD
>Extended UNIX Code-KR</TD
><TD
>Korean</TD
><TD
>Yes</TD
><TD
>1-3</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>EUC_TW</TT
></TD
><TD
>Extended UNIX Code-TW</TD
><TD
>Traditional Chinese, Taiwanese</TD
><TD
>Yes</TD
><TD
>1-3</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>GB18030</TT
></TD
><TD
>National Standard</TD
><TD
>Chinese</TD
><TD
>No</TD
><TD
>1-2</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>GBK</TT
></TD
><TD
>Extended National Standard</TD
><TD
>Simplified Chinese</TD
><TD
>No</TD
><TD
>1-2</TD
><TD
><TT
CLASS="LITERAL"
>WIN936</TT
>, <TT
CLASS="LITERAL"
>Windows936</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>ISO_8859_5</TT
></TD
><TD
>ISO 8859-5, <ACRONYM
CLASS="ACRONYM"
>ECMA</ACRONYM
> 113</TD
><TD
>Latin/Cyrillic</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>ISO_8859_6</TT
></TD
><TD
>ISO 8859-6, <ACRONYM
CLASS="ACRONYM"
>ECMA</ACRONYM
> 114</TD
><TD
>Latin/Arabic</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>ISO_8859_7</TT
></TD
><TD
>ISO 8859-7, <ACRONYM
CLASS="ACRONYM"
>ECMA</ACRONYM
> 118</TD
><TD
>Latin/Greek</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>ISO_8859_8</TT
></TD
><TD
>ISO 8859-8, <ACRONYM
CLASS="ACRONYM"
>ECMA</ACRONYM
> 121</TD
><TD
>Latin/Hebrew</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>JOHAB</TT
></TD
><TD
><ACRONYM
CLASS="ACRONYM"
>JOHAB</ACRONYM
></TD
><TD
>Korean (Hangul)</TD
><TD
>Yes</TD
><TD
>1-3</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>KOI8</TT
></TD
><TD
><ACRONYM
CLASS="ACRONYM"
>KOI</ACRONYM
>8-R(U)</TD
><TD
>Cyrillic</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>KOI8R</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN1</TT
></TD
><TD
>ISO 8859-1, <ACRONYM
CLASS="ACRONYM"
>ECMA</ACRONYM
> 94</TD
><TD
>Western European</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ISO88591</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN2</TT
></TD
><TD
>ISO 8859-2, <ACRONYM
CLASS="ACRONYM"
>ECMA</ACRONYM
> 94</TD
><TD
>Central European</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ISO88592</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN3</TT
></TD
><TD
>ISO 8859-3, <ACRONYM
CLASS="ACRONYM"
>ECMA</ACRONYM
> 94</TD
><TD
>South European</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ISO88593</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN4</TT
></TD
><TD
>ISO 8859-4, <ACRONYM
CLASS="ACRONYM"
>ECMA</ACRONYM
> 94</TD
><TD
>North European</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ISO88594</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN5</TT
></TD
><TD
>ISO 8859-9, <ACRONYM
CLASS="ACRONYM"
>ECMA</ACRONYM
> 128</TD
><TD
>Turkish</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ISO88599</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN6</TT
></TD
><TD
>ISO 8859-10, <ACRONYM
CLASS="ACRONYM"
>ECMA</ACRONYM
> 144</TD
><TD
>Nordic</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ISO885910</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN7</TT
></TD
><TD
>ISO 8859-13</TD
><TD
>Baltic</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ISO885913</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN8</TT
></TD
><TD
>ISO 8859-14</TD
><TD
>Celtic</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ISO885914</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN9</TT
></TD
><TD
>ISO 8859-15</TD
><TD
>LATIN1 with Euro and accents</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>ISO885915</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN10</TT
></TD
><TD
>ISO 8859-16, <ACRONYM
CLASS="ACRONYM"
>ASRO</ACRONYM
> SR 14111</TD
><TD
>Romanian</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ISO885916</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
></TD
><TD
>Mule internal code</TD
><TD
>Multilingual Emacs</TD
><TD
>Yes</TD
><TD
>1-4</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>SJIS</TT
></TD
><TD
>Shift JIS</TD
><TD
>Japanese</TD
><TD
>No</TD
><TD
>1-2</TD
><TD
><TT
CLASS="LITERAL"
>Mskanji</TT
>, <TT
CLASS="LITERAL"
>ShiftJIS</TT
>, <TT
CLASS="LITERAL"
>WIN932</TT
>, <TT
CLASS="LITERAL"
>Windows932</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>SQL_ASCII</TT
></TD
><TD
>unspecified (see text)</TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>any</I
></SPAN
></TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>UHC</TT
></TD
><TD
>Unified Hangul Code</TD
><TD
>Korean</TD
><TD
>No</TD
><TD
>1-2</TD
><TD
><TT
CLASS="LITERAL"
>WIN949</TT
>, <TT
CLASS="LITERAL"
>Windows949</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>UTF8</TT
></TD
><TD
>Unicode, 8-bit</TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>all</I
></SPAN
></TD
><TD
>Yes</TD
><TD
>1-4</TD
><TD
><TT
CLASS="LITERAL"
>Unicode</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN866</TT
></TD
><TD
>Windows CP866</TD
><TD
>Cyrillic</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ALT</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN874</TT
></TD
><TD
>Windows CP874</TD
><TD
>Thai</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1250</TT
></TD
><TD
>Windows CP1250</TD
><TD
>Central European</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1251</TT
></TD
><TD
>Windows CP1251</TD
><TD
>Cyrillic</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>WIN</TT
></TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1252</TT
></TD
><TD
>Windows CP1252</TD
><TD
>Western European</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1253</TT
></TD
><TD
>Windows CP1253</TD
><TD
>Greek</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1254</TT
></TD
><TD
>Windows CP1254</TD
><TD
>Turkish</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1255</TT
></TD
><TD
>Windows CP1255</TD
><TD
>Hebrew</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1256</TT
></TD
><TD
>Windows CP1256</TD
><TD
>Arabic</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1257</TT
></TD
><TD
>Windows CP1257</TD
><TD
>Baltic</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
>&nbsp;</TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1258</TT
></TD
><TD
>Windows CP1258</TD
><TD
>Vietnamese</TD
><TD
>Yes</TD
><TD
>1</TD
><TD
><TT
CLASS="LITERAL"
>ABC</TT
>, <TT
CLASS="LITERAL"
>TCVN</TT
>, <TT
CLASS="LITERAL"
>TCVN5712</TT
>, <TT
CLASS="LITERAL"
>VSCII</TT
></TD
></TR
></TBODY
></TABLE
></DIV
><P
>      Not all <ACRONYM
CLASS="ACRONYM"
>API</ACRONYM
>s support all the listed character sets. For example, the
      <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
>
      JDBC driver does not support <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>, <TT
CLASS="LITERAL"
>LATIN6</TT
>,
      <TT
CLASS="LITERAL"
>LATIN8</TT
>, and <TT
CLASS="LITERAL"
>LATIN10</TT
>.
     </P
><P
>      The <TT
CLASS="LITERAL"
>SQL_ASCII</TT
> setting behaves considerably differently
      from the other settings.  When the server character set is
      <TT
CLASS="LITERAL"
>SQL_ASCII</TT
>, the server interprets byte values 0-127
      according to the ASCII standard, while byte values 128-255 are taken
      as uninterpreted characters.  No encoding conversion will be done when
      the setting is <TT
CLASS="LITERAL"
>SQL_ASCII</TT
>.  Thus, this setting is not so
      much a declaration that a specific encoding is in use, as a declaration
      of ignorance about the encoding.  In most cases, if you are
      working with any non-ASCII data, it is unwise to use the
      <TT
CLASS="LITERAL"
>SQL_ASCII</TT
> setting, because
      <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> will be unable to help you by
      converting or validating non-ASCII characters.
     </P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="AEN24107"
>21.2.2. Setting the Character Set</A
></H2
><P
>     <TT
CLASS="COMMAND"
>initdb</TT
> defines the default character set
     for a <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> cluster. For example,

</P><PRE
CLASS="SCREEN"
>initdb -E EUC_JP</PRE
><P>

     sets the default character set (encoding) to
     <TT
CLASS="LITERAL"
>EUC_JP</TT
> (Extended Unix Code for Japanese).  You
     can use <TT
CLASS="OPTION"
>--encoding</TT
> instead of
     <TT
CLASS="OPTION"
>-E</TT
> if you prefer to type longer option strings.
     If no <TT
CLASS="OPTION"
>-E</TT
> or <TT
CLASS="OPTION"
>--encoding</TT
> option is
     given, <TT
CLASS="COMMAND"
>initdb</TT
> attempts to determine the appropriate
     encoding to use based on the specified or default locale.
    </P
><P
>     You can create a database with a different character set:

</P><PRE
CLASS="SCREEN"
>createdb -E EUC_KR korean</PRE
><P>

     This will create a database named <TT
CLASS="LITERAL"
>korean</TT
> that
     uses the character set <TT
CLASS="LITERAL"
>EUC_KR</TT
>.  Another way to
     accomplish this is to use this SQL command:

</P><PRE
CLASS="PROGRAMLISTING"
>CREATE DATABASE korean WITH ENCODING 'EUC_KR';</PRE
><P>

     The encoding for a database is stored in the system catalog
     <TT
CLASS="LITERAL"
>pg_database</TT
>.  You can see that by using the
     <TT
CLASS="OPTION"
>-l</TT
> option or the <TT
CLASS="COMMAND"
>\l</TT
> command
     of <TT
CLASS="COMMAND"
>psql</TT
>.

</P><PRE
CLASS="SCREEN"
>$ <KBD
CLASS="USERINPUT"
>psql -l</KBD
>
            List of databases
   Database    |  Owner  |   Encoding    
---------------+---------+---------------
 euc_cn        | t-ishii | EUC_CN
 euc_jp        | t-ishii | EUC_JP
 euc_kr        | t-ishii | EUC_KR
 euc_tw        | t-ishii | EUC_TW
 mule_internal | t-ishii | MULE_INTERNAL
 postgres      | t-ishii | EUC_JP
 regression    | t-ishii | SQL_ASCII
 template1     | t-ishii | EUC_JP
 test          | t-ishii | EUC_JP
 utf8          | t-ishii | UTF8
(9 rows)</PRE
><P>
    </P
><DIV
CLASS="IMPORTANT"
><BLOCKQUOTE
CLASS="IMPORTANT"
><P
><B
>Important: </B
>      Although you can specify any encoding you want for a database, it is
      unwise to choose an encoding that is not what is expected by the locale
      you have selected.  The <TT
CLASS="LITERAL"
>LC_COLLATE</TT
> and
      <TT
CLASS="LITERAL"
>LC_CTYPE</TT
> settings imply a particular encoding,
      and locale-dependent operations (such as sorting) are likely to
      misinterpret data that is in an incompatible encoding.
     </P
><P
>      Since these locale settings are frozen by <TT
CLASS="COMMAND"
>initdb</TT
>, the
      apparent flexibility to use different encodings in different databases
      of a cluster is more theoretical than real.  It is likely that these
      mechanisms will be revisited in future versions of
      <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
>.
     </P
><P
>      One way to use multiple encodings safely is to set the locale to
      <TT
CLASS="LITERAL"
>C</TT
> or <TT
CLASS="LITERAL"
>POSIX</TT
> during <TT
CLASS="COMMAND"
>initdb</TT
>, thus
      disabling any real locale awareness.
     </P
></BLOCKQUOTE
></DIV
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="AEN24141"
>21.2.3. Automatic Character Set Conversion Between Server and Client</A
></H2
><P
>     <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> supports automatic
     character set conversion between server and client for certain
     character set combinations. The conversion information is stored in the
     <TT
CLASS="LITERAL"
>pg_conversion</TT
> system catalog.  <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
>
     comes with some predefined conversions, as shown in <A
HREF="multibyte.html#MULTIBYTE-TRANSLATION-TABLE"
>Table 21-2</A
>. You can create a new
     conversion using the SQL command <TT
CLASS="COMMAND"
>CREATE CONVERSION</TT
>.
    </P
><DIV
CLASS="TABLE"
><A
NAME="MULTIBYTE-TRANSLATION-TABLE"
></A
><P
><B
>Table 21-2. Client/Server Character Set Conversions</B
></P
><TABLE
BORDER="1"
CLASS="CALSTABLE"
><COL><COL><THEAD
><TR
><TH
>Server Character Set</TH
><TH
>Available Client Character Sets</TH
></TR
></THEAD
><TBODY
><TR
><TD
><TT
CLASS="LITERAL"
>BIG5</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>not supported as a server encoding</I
></SPAN
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>EUC_CN</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>EUC_CN</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>EUC_JP</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>EUC_JP</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
         <TT
CLASS="LITERAL"
>SJIS</TT
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>EUC_KR</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>EUC_KR</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>EUC_TW</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>EUC_TW</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>BIG5</TT
>,
         <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>GB18030</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>not supported as a server encoding</I
></SPAN
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>GBK</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>not supported as a server encoding</I
></SPAN
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>ISO_8859_5</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>ISO_8859_5</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>KOI8</TT
>,
         <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>,
         <TT
CLASS="LITERAL"
>WIN866</TT
>,
         <TT
CLASS="LITERAL"
>WIN1251</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>ISO_8859_6</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>ISO_8859_6</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>ISO_8859_7</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>ISO_8859_7</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>ISO_8859_8</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>ISO_8859_8</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>JOHAB</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>JOHAB</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>KOI8</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>KOI8</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>ISO_8859_5</TT
>,
         <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>,
         <TT
CLASS="LITERAL"
>WIN866</TT
>,
         <TT
CLASS="LITERAL"
>WIN1251</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN1</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>LATIN1</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN2</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>LATIN2</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>,
         <TT
CLASS="LITERAL"
>WIN1250</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN3</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>LATIN3</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN4</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>LATIN4</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN5</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>LATIN5</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN6</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>LATIN6</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN7</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>LATIN7</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN8</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>LATIN8</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN9</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>LATIN9</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>LATIN10</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>LATIN10</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>MULE_INTERNAL</I
></SPAN
>,
          <TT
CLASS="LITERAL"
>BIG5</TT
>,
          <TT
CLASS="LITERAL"
>EUC_CN</TT
>,
          <TT
CLASS="LITERAL"
>EUC_JP</TT
>,
          <TT
CLASS="LITERAL"
>EUC_KR</TT
>,
          <TT
CLASS="LITERAL"
>EUC_TW</TT
>,
          <TT
CLASS="LITERAL"
>ISO_8859_5</TT
>,
          <TT
CLASS="LITERAL"
>KOI8</TT
>,
          <TT
CLASS="LITERAL"
>LATIN1</TT
> to <TT
CLASS="LITERAL"
>LATIN4</TT
>,
          <TT
CLASS="LITERAL"
>SJIS</TT
>,
          <TT
CLASS="LITERAL"
>WIN866</TT
>,
          <TT
CLASS="LITERAL"
>WIN1250</TT
>,
          <TT
CLASS="LITERAL"
>WIN1251</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>SJIS</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>not supported as a server encoding</I
></SPAN
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>SQL_ASCII</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>any (no conversion will be performed)</I
></SPAN
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>UHC</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>not supported as a server encoding</I
></SPAN
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>UTF8</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>all supported encodings</I
></SPAN
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN866</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN866</I
></SPAN
>,
          <TT
CLASS="LITERAL"
>ISO_8859_5</TT
>,
          <TT
CLASS="LITERAL"
>KOI8</TT
>,
          <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
          <TT
CLASS="LITERAL"
>UTF8</TT
>,
          <TT
CLASS="LITERAL"
>WIN1251</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN874</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN874</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1250</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN1250</I
></SPAN
>,
          <TT
CLASS="LITERAL"
>LATIN2</TT
>,
          <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
          <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1251</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN1251</I
></SPAN
>,
          <TT
CLASS="LITERAL"
>ISO_8859_5</TT
>,
          <TT
CLASS="LITERAL"
>KOI8</TT
>,
          <TT
CLASS="LITERAL"
>MULE_INTERNAL</TT
>,
          <TT
CLASS="LITERAL"
>UTF8</TT
>,
          <TT
CLASS="LITERAL"
>WIN866</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1252</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN1252</I
></SPAN
>,
          <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1253</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN1253</I
></SPAN
>,
          <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1254</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN1254</I
></SPAN
>,
          <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1255</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN1255</I
></SPAN
>,
          <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1256</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN1256</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1257</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN1257</I
></SPAN
>,
          <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
><TR
><TD
><TT
CLASS="LITERAL"
>WIN1258</TT
></TD
><TD
><SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>WIN1258</I
></SPAN
>,
         <TT
CLASS="LITERAL"
>UTF8</TT
>
         </TD
></TR
></TBODY
></TABLE
></DIV
><P
>     To enable automatic character set conversion, you have to
     tell <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> the character set
     (encoding) you would like to use in the client. There are several
     ways to accomplish this:

     <P
></P
></P><UL
><LI
><P
>        Using the <TT
CLASS="COMMAND"
>\encoding</TT
> command in
        <SPAN
CLASS="APPLICATION"
>psql</SPAN
>.
        <TT
CLASS="COMMAND"
>\encoding</TT
> allows you to change client
        encoding on the fly. For
        example, to change the encoding to <TT
CLASS="LITERAL"
>SJIS</TT
>, type:

</P><PRE
CLASS="PROGRAMLISTING"
>\encoding SJIS</PRE
><P>
       </P
></LI
><LI
><P
>        Using <SPAN
CLASS="APPLICATION"
>libpq</SPAN
> functions.
        <TT
CLASS="COMMAND"
>\encoding</TT
> actually calls
        <CODE
CLASS="FUNCTION"
>PQsetClientEncoding()</CODE
> for its purpose.

</P><PRE
CLASS="SYNOPSIS"
>int PQsetClientEncoding(PGconn *<TT
CLASS="REPLACEABLE"
><I
>conn</I
></TT
>, const char *<TT
CLASS="REPLACEABLE"
><I
>encoding</I
></TT
>);</PRE
><P>

        where <TT
CLASS="REPLACEABLE"
><I
>conn</I
></TT
> is a connection to the server,
        and <TT
CLASS="REPLACEABLE"
><I
>encoding</I
></TT
> is the encoding you
        want to use. If the function successfully sets the encoding, it returns 0,
        otherwise -1. The current encoding for this connection can be determined by
        using:

</P><PRE
CLASS="SYNOPSIS"
>int PQclientEncoding(const PGconn *<TT
CLASS="REPLACEABLE"
><I
>conn</I
></TT
>);</PRE
><P>

        Note that it returns the encoding ID, not a symbolic string
        such as <TT
CLASS="LITERAL"
>EUC_JP</TT
>. To convert an encoding ID to an encoding name, you
        can use:

</P><PRE
CLASS="SYNOPSIS"
>char *pg_encoding_to_char(int <TT
CLASS="REPLACEABLE"
><I
>encoding_id</I
></TT
>);</PRE
><P>
       </P
></LI
><LI
><P
>        Using <TT
CLASS="COMMAND"
>SET client_encoding TO</TT
>.

        Setting the client encoding can be done with this SQL command:

</P><PRE
CLASS="PROGRAMLISTING"
>SET CLIENT_ENCODING TO '<TT
CLASS="REPLACEABLE"
><I
>value</I
></TT
>';</PRE
><P>

        Also you can use the standard SQL syntax <TT
CLASS="LITERAL"
>SET NAMES</TT
>
        for this purpose:

</P><PRE
CLASS="PROGRAMLISTING"
>SET NAMES '<TT
CLASS="REPLACEABLE"
><I
>value</I
></TT
>';</PRE
><P>

        To query the current client encoding:

</P><PRE
CLASS="PROGRAMLISTING"
>SHOW client_encoding;</PRE
><P>

        To return to the default encoding:

</P><PRE
CLASS="PROGRAMLISTING"
>RESET client_encoding;</PRE
><P>
       </P
></LI
><LI
><P
>        Using <TT
CLASS="ENVAR"
>PGCLIENTENCODING</TT
>. If the environment variable
        <TT
CLASS="ENVAR"
>PGCLIENTENCODING</TT
> is defined in the client's
        environment, that client encoding is automatically selected
        when a connection to the server is made.  (This can
        subsequently be overridden using any of the other methods
        mentioned above.)
       </P
></LI
><LI
><P
>       Using the configuration variable <A
HREF="runtime-config-client.html#GUC-CLIENT-ENCODING"
>client_encoding</A
>. If the
       <TT
CLASS="VARNAME"
>client_encoding</TT
> variable is set, that client
       encoding is automatically selected when a connection to the
       server is made.  (This can subsequently be overridden using any
       of the other methods mentioned above.)
       </P
></LI
></UL
><P>
    </P
><P
>     If the conversion of a particular character is not possible
     &mdash; suppose you chose <TT
CLASS="LITERAL"
>EUC_JP</TT
> for the
     server and <TT
CLASS="LITERAL"
>LATIN1</TT
> for the client, then some
     Japanese characters do not have a representation in
     <TT
CLASS="LITERAL"
>LATIN1</TT
> &mdash; then an error is reported.
    </P
><P
>     If the client character set is defined as <TT
CLASS="LITERAL"
>SQL_ASCII</TT
>,
     encoding conversion is disabled, regardless of the server's character
     set.  Just as for the server, use of <TT
CLASS="LITERAL"
>SQL_ASCII</TT
> is unwise
     unless you are working with all-ASCII data.
    </P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="AEN24475"
>21.2.4. Further Reading</A
></H2
><P
>     These are good sources to start learning about various kinds of encoding
     systems.

     <P
></P
></P><DIV
CLASS="VARIABLELIST"
><DL
><DT
><A
HREF="http://www.i18ngurus.com/docs/984813247.html"
TARGET="_top"
>http://www.i18ngurus.com/docs/984813247.html</A
></DT
><DD
><P
>         An extensive collection of documents about character sets, encodings,
         and code pages.
        </P
></DD
><DT
><A
HREF="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf"
TARGET="_top"
>ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf</A
></DT
><DD
><P
>         Detailed explanations of <TT
CLASS="LITERAL"
>EUC_JP</TT
>,
         <TT
CLASS="LITERAL"
>EUC_CN</TT
>, <TT
CLASS="LITERAL"
>EUC_KR</TT
>,
         <TT
CLASS="LITERAL"
>EUC_TW</TT
> appear in section 3.2.
        </P
></DD
><DT
><A
HREF="http://www.unicode.org/"
TARGET="_top"
>http://www.unicode.org/</A
></DT
><DD
><P
>         The web site of the Unicode Consortium
        </P
></DD
><DT
>RFC 2044</DT
><DD
><P
>         <ACRONYM
CLASS="ACRONYM"
>UTF</ACRONYM
>-8 is defined here.
        </P
></DD
></DL
></DIV
><P>
    </P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="locale.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="maintenance.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Locale Support</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="charset.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Routine Database Maintenance Tasks</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>