Sunday, 9 March 2008

Oracle External Tables

An external table enables you to access data in a flat file as if it were in a table in the database.

External tables can be used to read the contents of a log file or spread sheet export, for example, or to import information into a data warehouse as part of an ETL operation.

A Simple Example

Suppose we have a file containing city populations in our /tmp directory and we wish to access this data from within our database.
Tokyo,Japan,33600000
Seoul,South Korea,23400000
Mexico City,Mexico,22400000
New York,USA,21900000
Bombay,India,21600000
Delhi,India,21500000
Sao Paulo,Brazil,20600000
Los Angeles,USA,18000000
Shanghai,China,17500000
Osaka,Japan,16700000

First, we create a DIRECTORY object which will reference the /tmp directory in our file system.

CREATE OR REPLACE DIRECTORY tmp_dir AS '/tmp';
Directory created.

Note: This command format will work with Windows and Linux.

Now, we create an external table by specifying the name of the file we wish to query and the directory where it can be located. We will rely on the default values of comma for a field separator and new line as the record delimiter, to describe our data.

CREATE TABLE pop_ext (
city VARCHAR2(30),
country VARCHAR2(30),
population NUMBER)
ORGANIZATION EXTERNAL
(DEFAULT DIRECTORY tmp_dir
LOCATION('pop.txt'));
Table created.

The file can now be queried as if it were a table in the database.

SELECT * 
FROM pop_ext;
CITY              COUNTRY          POPULATION             
----------------- ---------------- ------------------
Tokyo Japan 33600000
Seoul South Korea 23400000
Mexico City Mexico 22400000
New York USA 21900000
Bombay India 21600000
Delhi India 21500000
Sao Paulo Brazil 20600000
Los Angeles USA 18000000
Shanghai China 17500000
Osaka Japan 16700000
10 rows selected

External tables are read-only. An attempt to update the table will result in the following error message, complete with some useful advice on the appropriate action to take.

UPDATE pop_ext 
SET population = 0;
30657.0000 -  "operation not supported on external organized table"
*Cause: User attempted on operation on an external table which is
not supported.
*Action: Don't do that!

We could, of course, create a 'real' table containing the city populations if we wish to maintain the data.

CREATE TABLE pop AS
SELECT * FROM pop_ext;
Table created.

A More Detailed Example

In this example we have a file which contains bank holidays in the UK for 2007 to 2009.
"Holiday"       2007            2008            2009
"New Year" 01/01/07 01/01/08 01/01/09
"Good Friday" 06/04/07 21/03/08 10/04/09
"Easter Monday" 09/04/07 24/03/08 13/04/09
"Early May" 07/05/07 05/05/08 04/05/09
"(First Monday in May)"
"Spring Bank" 28/05/07 26/05/08 25/05/09
"(Last Monday in May)"
"Summer" 27/08/07 25/08/08 31/08/09
"(Last Monday in August)"
"Christmas Day" 25/12/07 25/12/08 25/12/09
"Boxing Day" 26/12/07 26/12/08 28/12/09

This time, we will include a few more options in the table definition to describe the format of the data and to exclude the comment lines (the text surrounded by brackets).

CREATE OR REPLACE DIRECTORY log_dir AS '/logs';
Directory created.

CREATE TABLE bank_hol_ext(
hol_desc VARCHAR2(50),
hol_2007 DATE,
hol_2008 DATE,
hol_2009 DATE
)
ORGANIZATION EXTERNAL
(TYPE oracle_loader
DEFAULT DIRECTORY tmp_dir
ACCESS PARAMETERS(
records delimited BY newline
load when (hol_2007 != BLANKS)
badfile log_dir:'bank_holidays.bad'
logfile log_dir:'bank_holidays.log'
discardfile log_dir:'bank_holidays.dsc'
skip 1
fields terminated by '\t'
optionally enclosed by '"'
missing field values are null
(
hol_desc,
hol_2007 DATE "dd/mm/yy",
hol_2008 DATE "dd/mm/yy",
hol_2009 DATE "dd/mm/yy"
)
)
LOCATION('hols.txt'))
REJECT LIMIT UNLIMITED;
Table created.

SELECT *
FROM bank_hol_ext;

HOL_DESC HOL_2007 HOL_2008 HOL_2009
----------------------------- -------------- -------------- ------------
New Year's Day 01-jan-2007 01-jan-2008 01-jan-2009
Good Friday 06-apr-2007 21-mar-2008 10-apr-2009
Easter Monday 09-apr-2007 24-mar-2008 13-apr-2009
Early May Bank Holiday 07-may-2007 05-may-2008 04-may-2009
Spring Bank Holiday 28-may-2007 26-may-2008 25-may-2009
Summer Bank Holiday 27-aug-2007 25-aug-2008 31-aug-2009
Christmas Day 25-dec-2007 25-dec-2008 25-dec-2009
Boxing Day 26-dec-2007 26-dec-2008 28-dec-2009
8 rows selected

Here, we have specified that the first row (the title) should be skipped, that fields may be surrounded by quotes and that a tab is used as the field delimiter. We also tell Oracle the format of the date fields and that we don't wish to impose a reject limit for bad records.

The logfile, badfile and discardfile clauses specify the location and file names for these file types. The logfile contains messages generated by the external tables utility while it is accessing data in the data file. The badfile contains the rows that could not be loaded due to errors (for example, a date was not valid for the format model) and the discardfile contains the rows excluded by the condition in the LOAD WHEN clause.

The default behaviour is to create these files in the same directory as the data file. However, if we only have read access to this directory, we can use these clauses to create the files in another location. In our example, we have chosen to write the files to the /logs directory.

If we wish, we can create another external table to view the contents of these files. Here, we define a table to query the log file, which contains a single column with a fixed width of four thousand bytes.

CREATE TABLE bank_hol_rej_ext (
rec VARCHAR2(4000))
ORGANIZATION EXTERNAL
(DEFAULT DIRECTORY log_dir
ACCESS PARAMETERS(
records delimited by newline
fields
missing field VALUES ARE NULL
(
rec position(1:4000)
)
)
LOCATION('bank_holidays.log'));
Table created.

SELECT *
FROM bank_hol_rej_ext
WHERE rownum <=10
AND rec IS NOT NULL;
REC
---------------------------------------------------------------
LOG file opened at 03/04/08 21:23:35
Field Definitions for table BANK_HOL_EXT
Record format DELIMITED BY NEWLINE
Data in file has same endianness as the platform
Rows with all null fields are accepted
Load when (HOL_2007 != BLANKS)
Fields in Data Source:
HOL_DESC CHAR (255)
Terminated by " "
Enclosed by """ and """
10 rows selected

We can change the table definition to reference a different file using the ALTER TABLE command.

ALTER TABLE bank_hol_rej_ext LOCATION ('bank_holidays.dsc');
Table altered.
This allows us to query the discard file with the same external table.
SELECT * 
FROM bank_hol_rej_ext;
REC
----------------------------------
"(First Monday in May)"
"(Last Monday in May)"
"(Last Monday in August)"
3 rows selected.

Data Pump

Oracle 10g introduced the option to create an external table using data constructed from within the database.
CREATE TABLE emp_ext
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY tmp_dir
LOCATION ('emp.dp')
)
AS
SELECT empno, ename, job, sal, comm, dname
FROM emp JOIN dept USING (deptno);
Table created.

This SQL will create a flat file in the /tmp directory containing the results of the SELECT statement. Unfortunately, the file is a proprietary binary format which can only be read by Data Pump. It would be useful to have the option to produce a pure CSV format which could easily be read by other tools and utilities. However, the file is readable across platforms, which allows us to move it between Windows and Linux and create another external table on the target machine, to read the data.

CREATE TABLE emp_ext (
empno NUMBER(4,0),
ename VARCHAR2(10),
job VARCHAR2(9),
sal NUMBER(7,2),
comm NUMBER(7,2),
dname VARCHAR2(14)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY tmp_dir
LOCATION ('emp.dp')
);
Table created.

This can be useful for transferring arbitrary data from one database to another where a database link is not available.

3 comments:

Anonymous said...

Спасибо понравилось ! Thanks !

Anonymous said...

Hi Mark,

Very useful and informative.
Youe article is very easy to understand.Thanks for publishing.

Ravi Rajasekharuni

Anonymous said...

I have attempted to use the external table process as a replacement for SQL*Loader. It was
progressing well until I bumped into an issue with NULLs.

I am loading the following table:
CREATE TABLE TEST_TBL
(CLAIM_ID VARCHAR2(38) NOT NULL,
DIAG_SEQUENCE NUMBER(10) NOT NULL,
POLICY_HOLDER_ID VARCHAR2(26) NOT NULL,
SUFFIX_ID VARCHAR2(4) NOT NULL,
DIAG_CD VARCHAR2(7) NOT NULL,
DISCHARGE_DATE DATE );

Notice that most of the fields are NOT NULL.

I am using the following external definition:

Create table Test_Tbl_Ext
(CLAIM_ID VARCHAR2(38),
DIAG_SEQUENCE NUMBER(10),
POLICY_HOLDER_ID VARCHAR2(26),
SUFFIX_ID VARCHAR2(4),
DIAG_CD VARCHAR2(7),
DISCHARGE_DATE DATE
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY Test_external_tables
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE
BADFILE 'Test_tbl.bad'
LOGFILE 'Test_tbl.log'
FIELDS TERMINATED BY ' '
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
( CLAIM_ID,
DIAG_SEQUENCE,
POLICY_HOLDER_ID,
SUFFIX_ID,
DIAG_CD,
DISCHARGE_DATE date mask "MM/DD/YYYY"
)
)
LOCATION ('Test_tbl.tab')
) REJECT LIMIT UNLIMITED;

As you can see, the records are tab delimited. In testing this, I use the basic "insert into
test_tbl ( select * from test_tbl_ext );" syntax.

I was surprised to discover that while it behaves like SQL*Loader and rejects format, size and
other data related issues, it does not handle the NOT NULL issue, it just stops with "ORA-01400:
cannot insert NULL into ("TEST"."TEST_TBL"."DIAG_CD").

Is there no good way to get an EXTERNAL table to handle the NULLs properly and put them in the bad
file ?
This would be very usefull to me.

Thanks in advance.