Monday, 30 June 2008

Using Byte and Char with Character Datatypes

Character datatypes such as CHAR, VARCHAR2 and CLOB can be defined using byte or character length semantics as follows.
CREATE TABLE t(x VARCHAR2(10),   
y VARCHAR2(20 BYTE),
z VARCHAR2(30 CHAR));
Table created.
If CHAR or BYTE is not specified then the value defined by the NLS_LENGTH_SEMANTICS parameter is used as the default. We can check the current value of this parameter as follows.
SELECT value
FROM v$nls_parameters
WHERE parameter = 'NLS_LENGTH_SEMANTICS';
VALUE                                                            
---------------
BYTE
1 rows selected
In the above example, table t was created with columns x and y with maximum lengths of 10 and 20 bytes respectively and column z with a maximum length of 30 characters.

But what does this actually mean?

Character Set

During database creation, the character set to be used by CHAR, VARCHAR2 and CLOB datatypes is specified. These are some of the values available for selection.

US7ASCII ASCII 7-bit American (SB)
WE8ISO8859P1 ISO 8859-1 West European 8-bit (SB)
EE8ISO8859P2 ISO 8859-2 East European 8-bit (SB)
JA16VMS JVMS 16-bit Japanese (MB)
KO16KSCCS KSCCS 16-bit Korean (MB)
AL16UTF16 Unicode 3.2 UTF-16 Universal character set (MB)
AL32UTF8 Unicode 3.2 UTF-8 Universal character set (MB)

SB = Single Byte
MB = Multi Byte
The character set for a database can be determined with the following SQL.
SELECT value
FROM v$nls_parameters
WHERE parameter = 'NLS_CHARACTERSET';
VALUE                                    
---------------
WE8ISO8859P1
1 rows selected

When using a single byte character set such as WE8ISO8859P1, a character is always held in a single byte so specifying the BYTE or CHAR qualifier has no effect on storage allocation. However, with a multi-byte character set such as EE8ISO8859P2, a single character may require up to 4 bytes of storage. In this case it may be useful to specify the length of the column in characters and let Oracle figure out how many bytes to allocate.

Consideration must be given to the maximum number of bytes that can be allocated. When using BYTE semantics the maximum is 2000 for CHAR datatypes, and 4000 for VARCHAR2. However, when using the CHAR length qualifier the maximum size that can be specified may be reduced in order to accommodate the character set.

National Character Set

The Unicode or national character set is also specified when creating the database. For example.


AL32UTF8 UTF-8
AL16UTF16 UTF-16

This setting is used by the Unicode datatypes NCHAR, NVARCHAR2 and NCLOB and the size of the column is always specified in characters.

Use the following SQL to determine the national character set details for a database.
SELECT VALUE
FROM nls_database_parameters
WHERE parameter = 'NLS_NCHAR_CHARACTERSET';
VALUE                                    
---------------
UTF8
1 rows selected

1 comment:

Mercia said...

Keep up the good work.