Tuesday, 20 May 2008

Oracle Pipelined Functions

In a previous article we saw that a collection can be returned from a PL/SQL function and used in a query as if it were a table.

Rows from a collection returned by a table function can also be pipelined. This means that the rows are returned to the client as they are produced, rather than all at once when the function has run to completion. This can improve performance, especially with large result sets, because the client can start to process the results without having to wait for the function to finish.

CSV to Column

We will develop a simple pipelined function to read data from an external table which contains the following contact data.
"mrblond@abc.com","m","25"," movies, music"
"mrsred@jkl.com","f","33","cooking , reading, yoga, travel"
"mrpink@def.com","m","31"," movies, reading,sport"
"mrsbrown@ghi.com","f","22","cycling, swimming"
The function will pivot the comma separated list of interests and present them in a column which contains a single interest like so.
mrblond@abc.com         m     25    movies
mrblond@abc.com m 25 music
mrsbrown@ghi.com f 22 cycling
mrsbrown@ghi.com f 22 swimming
mrpink@def.com m 31 movies
mrpink@def.com m 31 reading
mrpink@def.com m 31 sport
First we create a directory object and an external table to access the contact details in the flat file.
CREATE OR REPLACE DIRECTORY tmp_dir AS '/tmp';
Directory created.

CREATE TABLE contact_ext(
email VARCHAR2(100),
sex CHAR(1),
age NUMBER,
interests VARCHAR2(500)
)
ORGANIZATION EXTERNAL
(DEFAULT DIRECTORY tmp_dir
ACCESS PARAMETERS(
records delimited BY newline
fields terminated BY ','
optionally enclosed BY '"')
LOCATION('contacts.txt')
);
Table created.
Next we create a contact object to represent the new table structure and a nested table of this type.
CREATE OR REPLACE type contact_type AS OBJECT
(email VARCHAR2(100),
sex char(1),
age NUMBER,
interest VARCHAR(50));
Type created.

CREATE OR REPLACE TYPE contact_table_type AS
TABLE OF contact_type;
Type created.
The table function is then created using the nested table as the return type. The PIPELINED keyword indicates that the function will return rows iteratively.
CREATE OR REPLACE FUNCTION contact_list 
RETURN contact_table_type PIPELINED
AS
l_interests dbms_utility.uncl_array;
l_count NUMBER;
BEGIN
  FOR rec IN
(SELECT *
FROM contact_ext)
LOOP
dbms_utility.comma_to_table(list => rec.interests,
tablen => l_count,
tab => l_interests);
    FOR i IN 1 .. l_count
LOOP
PIPE ROW (CONTACT_TYPE(rec.email,
rec.sex,
rec.age,
TRIM(l_interests(i))));
END LOOP;
END LOOP;
END;
Function created.
We take each CSV list of values from the interests column of CONTACT_EXT and convert it into a nested table using the COMMA_TO_TABLE procedure of the DBMS_UTILITY package. We then iterate through the nested table and return a series of CONTACT_TYPEs using the PIPE ROW command.

This procedure can now be used as a table in a query.
SELECT * FROM TABLE (contact_list);
EMAIL                   SEX     AGE    INTEREST      
----------------------- ------- ------ -------------
mrblond@abc.com m 25 movies
mrblond@abc.com m 25 music
mrsbrown@ghi.com f 22 cycling
mrsbrown@ghi.com f 22 swimming
mrpink@def.com m 31 movies
mrpink@def.com m 31 reading
mrpink@def.com m 31 sport
mrsred@jkl.com f 33 cooking
mrsred@jkl.com f 33 reading
mrsred@jkl.com f 33 yoga
mrsred@jkl.com f 33 travel
11 rows selected

Column to CSV

Now let's do this in reverse. Suppose we are given the data in the above format and we need to return each email address on a separate line with the interests shown as a comma separated list.

We could use the SYS_CONNECT_BY_PATH function to provide an SQL only solution but for the purposes of this exercise we will develop a bespoke PL/SQL pipelined function.

First, we create a table to work with.
CREATE TABLE contact
AS SELECT * FROM TABLE (contact_list);
Table created.
This time, the function will take the contents of the table column and build up a comma separated string of values for each email address. The LEAD function helps us to identify when an email is about to change by looking at the next email address in the result set. Once again, rows are returned as they are produced using the PIPE ROW command.
CREATE OR REPLACE
FUNCTION contact_csv
RETURN contact_table_type PIPELINED
AS
l_interests VARCHAR2(1000);
l_count NUMBER;
BEGIN
  FOR row IN
(SELECT
email,
LEAD(email, 1) OVER (ORDER BY email) AS next_email,
sex,
age,
interest
FROM contact
ORDER BY email)
LOOP
    l_interests := l_interests || ',' || row.interest;
    IF (row.email != row.next_email OR
row.next_email IS NULL) THEN
PIPE ROW (CONTACT_TYPE(row.email,
row.sex,
row.age,
LTRIM(l_interests,',')));
l_interests := null;
END IF;
END LOOP;
END;
Function created.
We can confirm the results using the following SQL.
SELECT * 
FROM TABLE(contact_csv);
EMAIL                  SEX     AGE     INTEREST      
---------------------- ------- ------- ------------------------------
mrblond@abc.com m 25 movies,music
mrpink@def.com m 31 movies,reading,sport
mrsbrown@ghi.com f 22 cycling,swimming
mrsred@jkl.com f 33 cooking,reading,yoga,travel
4 rows selected

Summary

Oracle pipelined functions are table functions which return data iteratively. This allows the client to start processing the output from the function call before the entire collection has been populated.

1 comment:

UK said...

Hello Mark,

I was reading your blob and found very informative was wondering if you can give some insights on the program that I am working on .

I am planning to create a inhouse monitoring tool which will run against some v$ performance views every 10secs , my question is what is the best way to calculate a delta value from a table which keeps the cumalative values only such as v$sysstat.
The idea is to collect the values from these dynamic tables every 10 secs, calculate the delta and store the detal in a temporary table from where it will be displayed every 15 secs to the DBA who will be monitoring the database.
Please advice.