External tables can be used to read the contents of a log file or spread sheet export, for example, or to import information into a data warehouse as part of an ETL operation.
A Simple Example
Suppose we have a file containing city populations in our /tmp directory and we wish to access this data from within our database.Tokyo,Japan,33600000
Seoul,South Korea,23400000
Mexico City,Mexico,22400000
New York,USA,21900000
Bombay,India,21600000
Delhi,India,21500000
Sao Paulo,Brazil,20600000
Los Angeles,USA,18000000
Shanghai,China,17500000
Osaka,Japan,16700000
First, we create a DIRECTORY object which will reference the /tmp directory in our file system.
CREATE OR REPLACE DIRECTORY tmp_dir AS '/tmp';Directory created.
Note: This command format will work with Windows and Linux.
Now, we create an external table by specifying the name of the file we wish to query and the directory where it can be located. We will rely on the default values of comma for a field separator and new line as the record delimiter, to describe our data.
CREATE TABLE pop_ext (
city VARCHAR2(30),
country VARCHAR2(30),
population NUMBER)
ORGANIZATION EXTERNAL
(DEFAULT DIRECTORY tmp_dir
LOCATION('pop.txt'));Table created.
The file can now be queried as if it were a table in the database.
SELECT *
FROM pop_ext;CITY COUNTRY POPULATION
----------------- ---------------- ------------------
Tokyo Japan 33600000
Seoul South Korea 23400000
Mexico City Mexico 22400000
New York USA 21900000
Bombay India 21600000
Delhi India 21500000
Sao Paulo Brazil 20600000
Los Angeles USA 18000000
Shanghai China 17500000
Osaka Japan 1670000010 rows selected
External tables are read-only. An attempt to update the table will result in the following error message, complete with some useful advice on the appropriate action to take.
UPDATE pop_ext
SET population = 0;30657.0000 - "operation not supported on external organized table"
*Cause: User attempted on operation on an external table which is
not supported.
*Action: Don't do that!
We could, of course, create a 'real' table containing the city populations if we wish to maintain the data.
CREATE TABLE pop AS
SELECT * FROM pop_ext;Table created.
A More Detailed Example
In this example we have a file which contains bank holidays in the UK for 2007 to 2009."Holiday" 2007 2008 2009
"New Year" 01/01/07 01/01/08 01/01/09
"Good Friday" 06/04/07 21/03/08 10/04/09
"Easter Monday" 09/04/07 24/03/08 13/04/09
"Early May" 07/05/07 05/05/08 04/05/09
"(First Monday in May)"
"Spring Bank" 28/05/07 26/05/08 25/05/09
"(Last Monday in May)"
"Summer" 27/08/07 25/08/08 31/08/09
"(Last Monday in August)"
"Christmas Day" 25/12/07 25/12/08 25/12/09
"Boxing Day" 26/12/07 26/12/08 28/12/09
This time, we will include a few more options in the table definition to describe the format of the data and to exclude the comment lines (the text surrounded by brackets).
CREATE OR REPLACE DIRECTORY log_dir AS '/logs';Directory created.
CREATE TABLE bank_hol_ext(
hol_desc VARCHAR2(50),
hol_2007 DATE,
hol_2008 DATE,
hol_2009 DATE
)
ORGANIZATION EXTERNAL
(TYPE oracle_loader
DEFAULT DIRECTORY tmp_dir
ACCESS PARAMETERS(
records delimited BY newline
load when (hol_2007 != BLANKS)
badfile log_dir:'bank_holidays.bad'
logfile log_dir:'bank_holidays.log'
discardfile log_dir:'bank_holidays.dsc'
skip 1
fields terminated by '\t'
optionally enclosed by '"'
missing field values are null
(
hol_desc,
hol_2007 DATE "dd/mm/yy",
hol_2008 DATE "dd/mm/yy",
hol_2009 DATE "dd/mm/yy"
)
)
LOCATION('hols.txt'))
REJECT LIMIT UNLIMITED;Table created.
SELECT *
FROM bank_hol_ext;
HOL_DESC HOL_2007 HOL_2008 HOL_2009
----------------------------- -------------- -------------- ------------
New Year's Day 01-jan-2007 01-jan-2008 01-jan-2009
Good Friday 06-apr-2007 21-mar-2008 10-apr-2009
Easter Monday 09-apr-2007 24-mar-2008 13-apr-2009
Early May Bank Holiday 07-may-2007 05-may-2008 04-may-2009
Spring Bank Holiday 28-may-2007 26-may-2008 25-may-2009
Summer Bank Holiday 27-aug-2007 25-aug-2008 31-aug-2009
Christmas Day 25-dec-2007 25-dec-2008 25-dec-2009
Boxing Day 26-dec-2007 26-dec-2008 28-dec-20098 rows selected
Here, we have specified that the first row (the title) should be skipped, that fields may be surrounded by quotes and that a tab is used as the field delimiter. We also tell Oracle the format of the date fields and that we don't wish to impose a reject limit for bad records.
The logfile, badfile and discardfile clauses specify the location and file names for these file types. The logfile contains messages generated by the external tables utility while it is accessing data in the data file. The badfile contains the rows that could not be loaded due to errors (for example, a date was not valid for the format model) and the discardfile contains the rows excluded by the condition in the LOAD WHEN clause.
The default behaviour is to create these files in the same directory as the data file. However, if we only have read access to this directory, we can use these clauses to create the files in another location. In our example, we have chosen to write the files to the /logs directory.
If we wish, we can create another external table to view the contents of these files. Here, we define a table to query the log file, which contains a single column with a fixed width of four thousand bytes.
CREATE TABLE bank_hol_rej_ext (
rec VARCHAR2(4000))
ORGANIZATION EXTERNAL
(DEFAULT DIRECTORY log_dir
ACCESS PARAMETERS(
records delimited by newline
fields
missing field VALUES ARE NULL
(
rec position(1:4000)
)
)
LOCATION('bank_holidays.log'));Table created.
SELECT *
FROM bank_hol_rej_ext
WHERE rownum <=10
AND rec IS NOT NULL;REC
---------------------------------------------------------------
LOG file opened at 03/04/08 21:23:35
Field Definitions for table BANK_HOL_EXT
Record format DELIMITED BY NEWLINE
Data in file has same endianness as the platform
Rows with all null fields are accepted
Load when (HOL_2007 != BLANKS)
Fields in Data Source:
HOL_DESC CHAR (255)
Terminated by " "
Enclosed by """ and """10 rows selected
We can change the table definition to reference a different file using the ALTER TABLE command.
This allows us to query the discard file with the same external table.ALTER TABLE bank_hol_rej_ext LOCATION ('bank_holidays.dsc');Table altered.
SELECT *
FROM bank_hol_rej_ext;REC
----------------------------------
"(First Monday in May)"
"(Last Monday in May)"
"(Last Monday in August)"3 rows selected.
Data Pump
Oracle 10g introduced the option to create an external table using data constructed from within the database.CREATE TABLE emp_ext
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY tmp_dir
LOCATION ('emp.dp')
)
AS
SELECT empno, ename, job, sal, comm, dname
FROM emp JOIN dept USING (deptno);Table created.
This SQL will create a flat file in the /tmp directory containing the results of the SELECT statement. Unfortunately, the file is a proprietary binary format which can only be read by Data Pump. It would be useful to have the option to produce a pure CSV format which could easily be read by other tools and utilities. However, the file is readable across platforms, which allows us to move it between Windows and Linux and create another external table on the target machine, to read the data.
CREATE TABLE emp_ext (
empno NUMBER(4,0),
ename VARCHAR2(10),
job VARCHAR2(9),
sal NUMBER(7,2),
comm NUMBER(7,2),
dname VARCHAR2(14)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_DATAPUMP
DEFAULT DIRECTORY tmp_dir
LOCATION ('emp.dp')
);Table created.
This can be useful for transferring arbitrary data from one database to another where a database link is not available.
0 comments:
Post a Comment