<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <HTML ><HEAD ><TITLE >Character Set Support</TITLE ><META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK REV="MADE" HREF="mailto:pgsql-docs@postgresql.org"><LINK REL="HOME" TITLE="PostgreSQL 8.2.14 Documentation" HREF="index.html"><LINK REL="UP" TITLE="Localization" HREF="charset.html"><LINK REL="PREVIOUS" TITLE="Locale Support" HREF="locale.html"><LINK REL="NEXT" TITLE="Routine Database Maintenance Tasks" HREF="maintenance.html"><LINK REL="STYLESHEET" TYPE="text/css" HREF="stylesheet.css"><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"><META NAME="creation" CONTENT="2009-09-04T05:25:47"></HEAD ><BODY CLASS="SECT1" ><DIV CLASS="NAVHEADER" ><TABLE SUMMARY="Header navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TH COLSPAN="5" ALIGN="center" VALIGN="bottom" >PostgreSQL 8.2.14 Documentation</TH ></TR ><TR ><TD WIDTH="10%" ALIGN="left" VALIGN="top" ><A HREF="locale.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="10%" ALIGN="left" VALIGN="top" ><A HREF="charset.html" >Fast Backward</A ></TD ><TD WIDTH="60%" ALIGN="center" VALIGN="bottom" >Chapter 21. Localization</TD ><TD WIDTH="10%" ALIGN="right" VALIGN="top" ><A HREF="charset.html" >Fast Forward</A ></TD ><TD WIDTH="10%" ALIGN="right" VALIGN="top" ><A HREF="maintenance.html" ACCESSKEY="N" >Next</A ></TD ></TR ></TABLE ><HR ALIGN="LEFT" WIDTH="100%"></DIV ><DIV CLASS="SECT1" ><H1 CLASS="SECT1" ><A NAME="MULTIBYTE" >21.2. Character Set Support</A ></H1 ><A NAME="AEN23715" ></A ><P > The character set support in <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > allows you to store text in a variety of character sets, including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as <ACRONYM CLASS="ACRONYM" >EUC</ACRONYM > (Extended Unix Code), UTF-8, and Mule internal code. All supported character sets can be used transparently by clients, but a few are not supported for use within the server (that is, as a server-side encoding). The default character set is selected while initializing your <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > database cluster using <TT CLASS="COMMAND" >initdb</TT >. It can be overridden when you create a database, so you can have multiple databases each with a different character set. </P ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="MULTIBYTE-CHARSET-SUPPORTED" >21.2.1. Supported Character Sets</A ></H2 ><P > <A HREF="multibyte.html#CHARSET-TABLE" >Table 21-1</A > shows the character sets available for use in <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN >. </P ><DIV CLASS="TABLE" ><A NAME="CHARSET-TABLE" ></A ><P ><B >Table 21-1. <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > Character Sets</B ></P ><TABLE BORDER="1" CLASS="CALSTABLE" ><COL><COL><COL><COL><COL><COL><THEAD ><TR ><TH >Name</TH ><TH >Description</TH ><TH >Language</TH ><TH >Server?</TH ><TH >Bytes/Char</TH ><TH >Aliases</TH ></TR ></THEAD ><TBODY ><TR ><TD ><TT CLASS="LITERAL" >BIG5</TT ></TD ><TD >Big Five</TD ><TD >Traditional Chinese</TD ><TD >No</TD ><TD >1-2</TD ><TD ><TT CLASS="LITERAL" >WIN950</TT >, <TT CLASS="LITERAL" >Windows950</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_CN</TT ></TD ><TD >Extended UNIX Code-CN</TD ><TD >Simplified Chinese</TD ><TD >Yes</TD ><TD >1-3</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_JP</TT ></TD ><TD >Extended UNIX Code-JP</TD ><TD >Japanese</TD ><TD >Yes</TD ><TD >1-3</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_KR</TT ></TD ><TD >Extended UNIX Code-KR</TD ><TD >Korean</TD ><TD >Yes</TD ><TD >1-3</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_TW</TT ></TD ><TD >Extended UNIX Code-TW</TD ><TD >Traditional Chinese, Taiwanese</TD ><TD >Yes</TD ><TD >1-3</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >GB18030</TT ></TD ><TD >National Standard</TD ><TD >Chinese</TD ><TD >No</TD ><TD >1-2</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >GBK</TT ></TD ><TD >Extended National Standard</TD ><TD >Simplified Chinese</TD ><TD >No</TD ><TD >1-2</TD ><TD ><TT CLASS="LITERAL" >WIN936</TT >, <TT CLASS="LITERAL" >Windows936</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_5</TT ></TD ><TD >ISO 8859-5, <ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 113</TD ><TD >Latin/Cyrillic</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_6</TT ></TD ><TD >ISO 8859-6, <ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 114</TD ><TD >Latin/Arabic</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_7</TT ></TD ><TD >ISO 8859-7, <ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 118</TD ><TD >Latin/Greek</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_8</TT ></TD ><TD >ISO 8859-8, <ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 121</TD ><TD >Latin/Hebrew</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >JOHAB</TT ></TD ><TD ><ACRONYM CLASS="ACRONYM" >JOHAB</ACRONYM ></TD ><TD >Korean (Hangul)</TD ><TD >Yes</TD ><TD >1-3</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >KOI8</TT ></TD ><TD ><ACRONYM CLASS="ACRONYM" >KOI</ACRONYM >8-R(U)</TD ><TD >Cyrillic</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >KOI8R</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN1</TT ></TD ><TD >ISO 8859-1, <ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 94</TD ><TD >Western European</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ISO88591</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN2</TT ></TD ><TD >ISO 8859-2, <ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 94</TD ><TD >Central European</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ISO88592</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN3</TT ></TD ><TD >ISO 8859-3, <ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 94</TD ><TD >South European</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ISO88593</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN4</TT ></TD ><TD >ISO 8859-4, <ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 94</TD ><TD >North European</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ISO88594</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN5</TT ></TD ><TD >ISO 8859-9, <ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 128</TD ><TD >Turkish</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ISO88599</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN6</TT ></TD ><TD >ISO 8859-10, <ACRONYM CLASS="ACRONYM" >ECMA</ACRONYM > 144</TD ><TD >Nordic</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ISO885910</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN7</TT ></TD ><TD >ISO 8859-13</TD ><TD >Baltic</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ISO885913</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN8</TT ></TD ><TD >ISO 8859-14</TD ><TD >Celtic</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ISO885914</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN9</TT ></TD ><TD >ISO 8859-15</TD ><TD >LATIN1 with Euro and accents</TD ><TD >Yes</TD ><TD >1</TD ><TD >ISO885915</TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN10</TT ></TD ><TD >ISO 8859-16, <ACRONYM CLASS="ACRONYM" >ASRO</ACRONYM > SR 14111</TD ><TD >Romanian</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ISO885916</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >MULE_INTERNAL</TT ></TD ><TD >Mule internal code</TD ><TD >Multilingual Emacs</TD ><TD >Yes</TD ><TD >1-4</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >SJIS</TT ></TD ><TD >Shift JIS</TD ><TD >Japanese</TD ><TD >No</TD ><TD >1-2</TD ><TD ><TT CLASS="LITERAL" >Mskanji</TT >, <TT CLASS="LITERAL" >ShiftJIS</TT >, <TT CLASS="LITERAL" >WIN932</TT >, <TT CLASS="LITERAL" >Windows932</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >SQL_ASCII</TT ></TD ><TD >unspecified (see text)</TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >any</I ></SPAN ></TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >UHC</TT ></TD ><TD >Unified Hangul Code</TD ><TD >Korean</TD ><TD >No</TD ><TD >1-2</TD ><TD ><TT CLASS="LITERAL" >WIN949</TT >, <TT CLASS="LITERAL" >Windows949</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >UTF8</TT ></TD ><TD >Unicode, 8-bit</TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >all</I ></SPAN ></TD ><TD >Yes</TD ><TD >1-4</TD ><TD ><TT CLASS="LITERAL" >Unicode</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN866</TT ></TD ><TD >Windows CP866</TD ><TD >Cyrillic</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ALT</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN874</TT ></TD ><TD >Windows CP874</TD ><TD >Thai</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1250</TT ></TD ><TD >Windows CP1250</TD ><TD >Central European</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1251</TT ></TD ><TD >Windows CP1251</TD ><TD >Cyrillic</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >WIN</TT ></TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1252</TT ></TD ><TD >Windows CP1252</TD ><TD >Western European</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1253</TT ></TD ><TD >Windows CP1253</TD ><TD >Greek</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1254</TT ></TD ><TD >Windows CP1254</TD ><TD >Turkish</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1255</TT ></TD ><TD >Windows CP1255</TD ><TD >Hebrew</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1256</TT ></TD ><TD >Windows CP1256</TD ><TD >Arabic</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1257</TT ></TD ><TD >Windows CP1257</TD ><TD >Baltic</TD ><TD >Yes</TD ><TD >1</TD ><TD > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1258</TT ></TD ><TD >Windows CP1258</TD ><TD >Vietnamese</TD ><TD >Yes</TD ><TD >1</TD ><TD ><TT CLASS="LITERAL" >ABC</TT >, <TT CLASS="LITERAL" >TCVN</TT >, <TT CLASS="LITERAL" >TCVN5712</TT >, <TT CLASS="LITERAL" >VSCII</TT ></TD ></TR ></TBODY ></TABLE ></DIV ><P > Not all <ACRONYM CLASS="ACRONYM" >API</ACRONYM >s support all the listed character sets. For example, the <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > JDBC driver does not support <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >LATIN6</TT >, <TT CLASS="LITERAL" >LATIN8</TT >, and <TT CLASS="LITERAL" >LATIN10</TT >. </P ><P > The <TT CLASS="LITERAL" >SQL_ASCII</TT > setting behaves considerably differently from the other settings. When the server character set is <TT CLASS="LITERAL" >SQL_ASCII</TT >, the server interprets byte values 0-127 according to the ASCII standard, while byte values 128-255 are taken as uninterpreted characters. No encoding conversion will be done when the setting is <TT CLASS="LITERAL" >SQL_ASCII</TT >. Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most cases, if you are working with any non-ASCII data, it is unwise to use the <TT CLASS="LITERAL" >SQL_ASCII</TT > setting, because <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > will be unable to help you by converting or validating non-ASCII characters. </P ></DIV ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN24107" >21.2.2. Setting the Character Set</A ></H2 ><P > <TT CLASS="COMMAND" >initdb</TT > defines the default character set for a <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > cluster. For example, </P><PRE CLASS="SCREEN" >initdb -E EUC_JP</PRE ><P> sets the default character set (encoding) to <TT CLASS="LITERAL" >EUC_JP</TT > (Extended Unix Code for Japanese). You can use <TT CLASS="OPTION" >--encoding</TT > instead of <TT CLASS="OPTION" >-E</TT > if you prefer to type longer option strings. If no <TT CLASS="OPTION" >-E</TT > or <TT CLASS="OPTION" >--encoding</TT > option is given, <TT CLASS="COMMAND" >initdb</TT > attempts to determine the appropriate encoding to use based on the specified or default locale. </P ><P > You can create a database with a different character set: </P><PRE CLASS="SCREEN" >createdb -E EUC_KR korean</PRE ><P> This will create a database named <TT CLASS="LITERAL" >korean</TT > that uses the character set <TT CLASS="LITERAL" >EUC_KR</TT >. Another way to accomplish this is to use this SQL command: </P><PRE CLASS="PROGRAMLISTING" >CREATE DATABASE korean WITH ENCODING 'EUC_KR';</PRE ><P> The encoding for a database is stored in the system catalog <TT CLASS="LITERAL" >pg_database</TT >. You can see that by using the <TT CLASS="OPTION" >-l</TT > option or the <TT CLASS="COMMAND" >\l</TT > command of <TT CLASS="COMMAND" >psql</TT >. </P><PRE CLASS="SCREEN" >$ <KBD CLASS="USERINPUT" >psql -l</KBD > List of databases Database | Owner | Encoding ---------------+---------+--------------- euc_cn | t-ishii | EUC_CN euc_jp | t-ishii | EUC_JP euc_kr | t-ishii | EUC_KR euc_tw | t-ishii | EUC_TW mule_internal | t-ishii | MULE_INTERNAL postgres | t-ishii | EUC_JP regression | t-ishii | SQL_ASCII template1 | t-ishii | EUC_JP test | t-ishii | EUC_JP utf8 | t-ishii | UTF8 (9 rows)</PRE ><P> </P ><DIV CLASS="IMPORTANT" ><BLOCKQUOTE CLASS="IMPORTANT" ><P ><B >Important: </B > Although you can specify any encoding you want for a database, it is unwise to choose an encoding that is not what is expected by the locale you have selected. The <TT CLASS="LITERAL" >LC_COLLATE</TT > and <TT CLASS="LITERAL" >LC_CTYPE</TT > settings imply a particular encoding, and locale-dependent operations (such as sorting) are likely to misinterpret data that is in an incompatible encoding. </P ><P > Since these locale settings are frozen by <TT CLASS="COMMAND" >initdb</TT >, the apparent flexibility to use different encodings in different databases of a cluster is more theoretical than real. It is likely that these mechanisms will be revisited in future versions of <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN >. </P ><P > One way to use multiple encodings safely is to set the locale to <TT CLASS="LITERAL" >C</TT > or <TT CLASS="LITERAL" >POSIX</TT > during <TT CLASS="COMMAND" >initdb</TT >, thus disabling any real locale awareness. </P ></BLOCKQUOTE ></DIV ></DIV ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN24141" >21.2.3. Automatic Character Set Conversion Between Server and Client</A ></H2 ><P > <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > supports automatic character set conversion between server and client for certain character set combinations. The conversion information is stored in the <TT CLASS="LITERAL" >pg_conversion</TT > system catalog. <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > comes with some predefined conversions, as shown in <A HREF="multibyte.html#MULTIBYTE-TRANSLATION-TABLE" >Table 21-2</A >. You can create a new conversion using the SQL command <TT CLASS="COMMAND" >CREATE CONVERSION</TT >. </P ><DIV CLASS="TABLE" ><A NAME="MULTIBYTE-TRANSLATION-TABLE" ></A ><P ><B >Table 21-2. Client/Server Character Set Conversions</B ></P ><TABLE BORDER="1" CLASS="CALSTABLE" ><COL><COL><THEAD ><TR ><TH >Server Character Set</TH ><TH >Available Client Character Sets</TH ></TR ></THEAD ><TBODY ><TR ><TD ><TT CLASS="LITERAL" >BIG5</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >not supported as a server encoding</I ></SPAN > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_CN</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >EUC_CN</I ></SPAN >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_JP</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >EUC_JP</I ></SPAN >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >SJIS</TT >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_KR</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >EUC_KR</I ></SPAN >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >EUC_TW</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >EUC_TW</I ></SPAN >, <TT CLASS="LITERAL" >BIG5</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >GB18030</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >not supported as a server encoding</I ></SPAN > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >GBK</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >not supported as a server encoding</I ></SPAN > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_5</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >ISO_8859_5</I ></SPAN >, <TT CLASS="LITERAL" >KOI8</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT >, <TT CLASS="LITERAL" >WIN866</TT >, <TT CLASS="LITERAL" >WIN1251</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_6</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >ISO_8859_6</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_7</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >ISO_8859_7</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >ISO_8859_8</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >ISO_8859_8</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >JOHAB</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >JOHAB</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >KOI8</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >KOI8</I ></SPAN >, <TT CLASS="LITERAL" >ISO_8859_5</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT >, <TT CLASS="LITERAL" >WIN866</TT >, <TT CLASS="LITERAL" >WIN1251</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN1</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >LATIN1</I ></SPAN >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN2</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >LATIN2</I ></SPAN >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT >, <TT CLASS="LITERAL" >WIN1250</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN3</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >LATIN3</I ></SPAN >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN4</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >LATIN4</I ></SPAN >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN5</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >LATIN5</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN6</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >LATIN6</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN7</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >LATIN7</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN8</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >LATIN8</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN9</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >LATIN9</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >LATIN10</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >LATIN10</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >MULE_INTERNAL</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >MULE_INTERNAL</I ></SPAN >, <TT CLASS="LITERAL" >BIG5</TT >, <TT CLASS="LITERAL" >EUC_CN</TT >, <TT CLASS="LITERAL" >EUC_JP</TT >, <TT CLASS="LITERAL" >EUC_KR</TT >, <TT CLASS="LITERAL" >EUC_TW</TT >, <TT CLASS="LITERAL" >ISO_8859_5</TT >, <TT CLASS="LITERAL" >KOI8</TT >, <TT CLASS="LITERAL" >LATIN1</TT > to <TT CLASS="LITERAL" >LATIN4</TT >, <TT CLASS="LITERAL" >SJIS</TT >, <TT CLASS="LITERAL" >WIN866</TT >, <TT CLASS="LITERAL" >WIN1250</TT >, <TT CLASS="LITERAL" >WIN1251</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >SJIS</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >not supported as a server encoding</I ></SPAN > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >SQL_ASCII</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >any (no conversion will be performed)</I ></SPAN > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >UHC</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >not supported as a server encoding</I ></SPAN > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >UTF8</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >all supported encodings</I ></SPAN > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN866</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN866</I ></SPAN >, <TT CLASS="LITERAL" >ISO_8859_5</TT >, <TT CLASS="LITERAL" >KOI8</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT >, <TT CLASS="LITERAL" >WIN1251</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN874</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN874</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1250</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN1250</I ></SPAN >, <TT CLASS="LITERAL" >LATIN2</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1251</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN1251</I ></SPAN >, <TT CLASS="LITERAL" >ISO_8859_5</TT >, <TT CLASS="LITERAL" >KOI8</TT >, <TT CLASS="LITERAL" >MULE_INTERNAL</TT >, <TT CLASS="LITERAL" >UTF8</TT >, <TT CLASS="LITERAL" >WIN866</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1252</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN1252</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1253</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN1253</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1254</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN1254</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1255</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN1255</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1256</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN1256</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1257</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN1257</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ><TR ><TD ><TT CLASS="LITERAL" >WIN1258</TT ></TD ><TD ><SPAN CLASS="emphasis" ><I CLASS="EMPHASIS" >WIN1258</I ></SPAN >, <TT CLASS="LITERAL" >UTF8</TT > </TD ></TR ></TBODY ></TABLE ></DIV ><P > To enable automatic character set conversion, you have to tell <SPAN CLASS="PRODUCTNAME" >PostgreSQL</SPAN > the character set (encoding) you would like to use in the client. There are several ways to accomplish this: <P ></P ></P><UL ><LI ><P > Using the <TT CLASS="COMMAND" >\encoding</TT > command in <SPAN CLASS="APPLICATION" >psql</SPAN >. <TT CLASS="COMMAND" >\encoding</TT > allows you to change client encoding on the fly. For example, to change the encoding to <TT CLASS="LITERAL" >SJIS</TT >, type: </P><PRE CLASS="PROGRAMLISTING" >\encoding SJIS</PRE ><P> </P ></LI ><LI ><P > Using <SPAN CLASS="APPLICATION" >libpq</SPAN > functions. <TT CLASS="COMMAND" >\encoding</TT > actually calls <CODE CLASS="FUNCTION" >PQsetClientEncoding()</CODE > for its purpose. </P><PRE CLASS="SYNOPSIS" >int PQsetClientEncoding(PGconn *<TT CLASS="REPLACEABLE" ><I >conn</I ></TT >, const char *<TT CLASS="REPLACEABLE" ><I >encoding</I ></TT >);</PRE ><P> where <TT CLASS="REPLACEABLE" ><I >conn</I ></TT > is a connection to the server, and <TT CLASS="REPLACEABLE" ><I >encoding</I ></TT > is the encoding you want to use. If the function successfully sets the encoding, it returns 0, otherwise -1. The current encoding for this connection can be determined by using: </P><PRE CLASS="SYNOPSIS" >int PQclientEncoding(const PGconn *<TT CLASS="REPLACEABLE" ><I >conn</I ></TT >);</PRE ><P> Note that it returns the encoding ID, not a symbolic string such as <TT CLASS="LITERAL" >EUC_JP</TT >. To convert an encoding ID to an encoding name, you can use: </P><PRE CLASS="SYNOPSIS" >char *pg_encoding_to_char(int <TT CLASS="REPLACEABLE" ><I >encoding_id</I ></TT >);</PRE ><P> </P ></LI ><LI ><P > Using <TT CLASS="COMMAND" >SET client_encoding TO</TT >. Setting the client encoding can be done with this SQL command: </P><PRE CLASS="PROGRAMLISTING" >SET CLIENT_ENCODING TO '<TT CLASS="REPLACEABLE" ><I >value</I ></TT >';</PRE ><P> Also you can use the standard SQL syntax <TT CLASS="LITERAL" >SET NAMES</TT > for this purpose: </P><PRE CLASS="PROGRAMLISTING" >SET NAMES '<TT CLASS="REPLACEABLE" ><I >value</I ></TT >';</PRE ><P> To query the current client encoding: </P><PRE CLASS="PROGRAMLISTING" >SHOW client_encoding;</PRE ><P> To return to the default encoding: </P><PRE CLASS="PROGRAMLISTING" >RESET client_encoding;</PRE ><P> </P ></LI ><LI ><P > Using <TT CLASS="ENVAR" >PGCLIENTENCODING</TT >. If the environment variable <TT CLASS="ENVAR" >PGCLIENTENCODING</TT > is defined in the client's environment, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.) </P ></LI ><LI ><P > Using the configuration variable <A HREF="runtime-config-client.html#GUC-CLIENT-ENCODING" >client_encoding</A >. If the <TT CLASS="VARNAME" >client_encoding</TT > variable is set, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.) </P ></LI ></UL ><P> </P ><P > If the conversion of a particular character is not possible — suppose you chose <TT CLASS="LITERAL" >EUC_JP</TT > for the server and <TT CLASS="LITERAL" >LATIN1</TT > for the client, then some Japanese characters do not have a representation in <TT CLASS="LITERAL" >LATIN1</TT > — then an error is reported. </P ><P > If the client character set is defined as <TT CLASS="LITERAL" >SQL_ASCII</TT >, encoding conversion is disabled, regardless of the server's character set. Just as for the server, use of <TT CLASS="LITERAL" >SQL_ASCII</TT > is unwise unless you are working with all-ASCII data. </P ></DIV ><DIV CLASS="SECT2" ><H2 CLASS="SECT2" ><A NAME="AEN24475" >21.2.4. Further Reading</A ></H2 ><P > These are good sources to start learning about various kinds of encoding systems. <P ></P ></P><DIV CLASS="VARIABLELIST" ><DL ><DT ><A HREF="http://www.i18ngurus.com/docs/984813247.html" TARGET="_top" >http://www.i18ngurus.com/docs/984813247.html</A ></DT ><DD ><P > An extensive collection of documents about character sets, encodings, and code pages. </P ></DD ><DT ><A HREF="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf" TARGET="_top" >ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf</A ></DT ><DD ><P > Detailed explanations of <TT CLASS="LITERAL" >EUC_JP</TT >, <TT CLASS="LITERAL" >EUC_CN</TT >, <TT CLASS="LITERAL" >EUC_KR</TT >, <TT CLASS="LITERAL" >EUC_TW</TT > appear in section 3.2. </P ></DD ><DT ><A HREF="http://www.unicode.org/" TARGET="_top" >http://www.unicode.org/</A ></DT ><DD ><P > The web site of the Unicode Consortium </P ></DD ><DT >RFC 2044</DT ><DD ><P > <ACRONYM CLASS="ACRONYM" >UTF</ACRONYM >-8 is defined here. </P ></DD ></DL ></DIV ><P> </P ></DIV ></DIV ><DIV CLASS="NAVFOOTER" ><HR ALIGN="LEFT" WIDTH="100%"><TABLE SUMMARY="Footer navigation table" WIDTH="100%" BORDER="0" CELLPADDING="0" CELLSPACING="0" ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" ><A HREF="locale.html" ACCESSKEY="P" >Prev</A ></TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="index.html" ACCESSKEY="H" >Home</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" ><A HREF="maintenance.html" ACCESSKEY="N" >Next</A ></TD ></TR ><TR ><TD WIDTH="33%" ALIGN="left" VALIGN="top" >Locale Support</TD ><TD WIDTH="34%" ALIGN="center" VALIGN="top" ><A HREF="charset.html" ACCESSKEY="U" >Up</A ></TD ><TD WIDTH="33%" ALIGN="right" VALIGN="top" >Routine Database Maintenance Tasks</TD ></TR ></TABLE ></DIV ></BODY ></HTML >