Tuesday, 20 May 2008

Oracle Pipelined Functions

In a previous article we saw that a collection can be returned from a PL/SQL function and used in a query as if it were a table.

Rows from a collection returned by a table function can also be pipelined. This means that the rows are returned to the client as they are produced, rather than all at once when the function has run to completion. This can improve performance, especially with large result sets, because the client can start to process the results without having to wait for the function to finish.

CSV to Column

We will develop a simple pipelined function to read data from an external table which contains the following contact data.
"mrblond@abc.com","m","25"," movies, music"
"mrsred@jkl.com","f","33","cooking , reading, yoga, travel"
"mrpink@def.com","m","31"," movies, reading,sport"
"mrsbrown@ghi.com","f","22","cycling, swimming"
The function will pivot the comma separated list of interests and present them in a column which contains a single interest like so.
mrblond@abc.com         m     25    movies
mrblond@abc.com m 25 music
mrsbrown@ghi.com f 22 cycling
mrsbrown@ghi.com f 22 swimming
mrpink@def.com m 31 movies
mrpink@def.com m 31 reading
mrpink@def.com m 31 sport
First we create a directory object and an external table to access the contact details in the flat file.
CREATE OR REPLACE DIRECTORY tmp_dir AS '/tmp';
Directory created.

CREATE TABLE contact_ext(
email VARCHAR2(100),
sex CHAR(1),
age NUMBER,
interests VARCHAR2(500)
)
ORGANIZATION EXTERNAL
(DEFAULT DIRECTORY tmp_dir
ACCESS PARAMETERS(
records delimited BY newline
fields terminated BY ','
optionally enclosed BY '"')
LOCATION('contacts.txt')
);
Table created.
Next we create a contact object to represent the new table structure and a nested table of this type.
CREATE OR REPLACE type contact_type AS OBJECT
(email VARCHAR2(100),
sex char(1),
age NUMBER,
interest VARCHAR(50));
Type created.

CREATE OR REPLACE TYPE contact_table_type AS
TABLE OF contact_type;
Type created.
The table function is then created using the nested table as the return type. The PIPELINED keyword indicates that the function will return rows iteratively.
CREATE OR REPLACE FUNCTION contact_list 
RETURN contact_table_type PIPELINED
AS
l_interests dbms_utility.uncl_array;
l_count NUMBER;
BEGIN
  FOR rec IN
(SELECT *
FROM contact_ext)
LOOP
dbms_utility.comma_to_table(list => rec.interests,
tablen => l_count,
tab => l_interests);
    FOR i IN 1 .. l_count
LOOP
PIPE ROW (CONTACT_TYPE(rec.email,
rec.sex,
rec.age,
TRIM(l_interests(i))));
END LOOP;
END LOOP;
END;
Function created.
We take each CSV list of values from the interests column of CONTACT_EXT and convert it into a nested table using the COMMA_TO_TABLE procedure of the DBMS_UTILITY package. We then iterate through the nested table and return a series of CONTACT_TYPEs using the PIPE ROW command.

This procedure can now be used as a table in a query.
SELECT * FROM TABLE (contact_list);
EMAIL                   SEX     AGE    INTEREST      
----------------------- ------- ------ -------------
mrblond@abc.com m 25 movies
mrblond@abc.com m 25 music
mrsbrown@ghi.com f 22 cycling
mrsbrown@ghi.com f 22 swimming
mrpink@def.com m 31 movies
mrpink@def.com m 31 reading
mrpink@def.com m 31 sport
mrsred@jkl.com f 33 cooking
mrsred@jkl.com f 33 reading
mrsred@jkl.com f 33 yoga
mrsred@jkl.com f 33 travel
11 rows selected

Column to CSV

Now let's do this in reverse. Suppose we are given the data in the above format and we need to return each email address on a separate line with the interests shown as a comma separated list.

We could use the SYS_CONNECT_BY_PATH function to provide an SQL only solution but for the purposes of this exercise we will develop a bespoke PL/SQL pipelined function.

First, we create a table to work with.
CREATE TABLE contact
AS SELECT * FROM TABLE (contact_list);
Table created.
This time, the function will take the contents of the table column and build up a comma separated string of values for each email address. The LEAD function helps us to identify when an email is about to change by looking at the next email address in the result set. Once again, rows are returned as they are produced using the PIPE ROW command.
CREATE OR REPLACE
FUNCTION contact_csv
RETURN contact_table_type PIPELINED
AS
l_interests VARCHAR2(1000);
l_count NUMBER;
BEGIN
  FOR row IN
(SELECT
email,
LEAD(email, 1) OVER (ORDER BY email) AS next_email,
sex,
age,
interest
FROM contact
ORDER BY email)
LOOP
    l_interests := l_interests || ',' || row.interest;
    IF (row.email != row.next_email OR
row.next_email IS NULL) THEN
PIPE ROW (CONTACT_TYPE(row.email,
row.sex,
row.age,
LTRIM(l_interests,',')));
l_interests := null;
END IF;
END LOOP;
END;
Function created.
We can confirm the results using the following SQL.
SELECT * 
FROM TABLE(contact_csv);
EMAIL                  SEX     AGE     INTEREST      
---------------------- ------- ------- ------------------------------
mrblond@abc.com m 25 movies,music
mrpink@def.com m 31 movies,reading,sport
mrsbrown@ghi.com f 22 cycling,swimming
mrsred@jkl.com f 33 cooking,reading,yoga,travel
4 rows selected

Summary

Oracle pipelined functions are table functions which return data iteratively. This allows the client to start processing the output from the function call before the entire collection has been populated.

How to pivot a column to a CSV list in Oracle

Suppose we have a column of data that we wish to display as a CSV list. One solution is to develop a pipelined table function in PL/SQL, but how can we achieve the same results using just SQL? This article shows how the Oracle SYS_CONNECT_BY_PATH function can be used to solve the problem.

Example

Using the employee table from the Oracle demo schema, we can easily display a list of employee names for each department.

SELECT deptno, ename
FROM emp
ORDER BY deptno;
DEPTNO        ENAME      
------------- ----------
10 CLARK
10 KING
10 MILLER
20 SMITH
20 ADAMS
20 FORD
20 SCOTT
20 JONES
30 ALLEN
30 BLAKE
30 MARTIN
30 JAMES
30 TURNER
30 WARD
14 rows selected
But suppose we wish to format the names as a comma separated list as follows.
DEPTNO       EMPLOYEES
------------ ------------------------------------
10 CLARK,KING,MILLER
20 SMITH,ADAMS,FORD,SCOTT,JONES
30 ALLEN,BLAKE,MARTIN,JAMES,TURNER,WARD
3 rows selected

SYS_CONNECT_BY_PATH

The technique described here uses the SYS_CONNECT_BY_PATH function to provide the CSV column. Let's remind ourselves of what SYS_CONNECT_BY_PATH actually does by displaying a a hierarchy of employees.
SELECT 
empno,
mgr,
level,
SYS_CONNECT_BY_PATH(ename, '/') path
FROM emp
START WITH ename ='KING'
CONNECT BY PRIOR empno = mgr;
EMPNO   MGR        LEVEL     PATH
------- ---------- --------- --------------------------
7839 1 /KING
7566 7839 2 /KING/JONES
7788 7566 3 /KING/JONES/SCOTT
7876 7788 4 /KING/JONES/SCOTT/ADAMS
7902 7566 3 /KING/JONES/FORD
7369 7902 4 /KING/JONES/FORD/SMITH
7698 7839 2 /KING/BLAKE
7499 7698 3 /KING/BLAKE/ALLEN
7521 7698 3 /KING/BLAKE/WARD
7654 7698 3 /KING/BLAKE/MARTIN
7844 7698 3 /KING/BLAKE/TURNER
7900 7698 3 /KING/BLAKE/JAMES
7782 7839 2 /KING/CLARK
7934 7782 3 /KING/CLARK/MILLER
14 rows selected
SYS_CONNECT_BY_PATH returns the path of a column value from root to node, with column values separated by a '/' (or whichever character we specify) for each row returned by CONNECT BY condition.

Now this function is only valid in hierarchical queries and unfortunately, in our example, there is no natural hierarchy in our data that we can use. Therefore, we must manufacture one using analytic functions.

Analytic Functions

Let's add a couple more columns to our original query.
SELECT deptno,
ename,
row_number() over(PARTITION BY deptno ORDER BY deptno) rnum,
count(*) over(PARTITION BY deptno) tot
FROM emp;
DEPTNO       ENAME      RNUM  TOT                  
------------ ---------- ------ -----
10 CLARK 1 3
10 KING 2 3
10 MILLER 3 3
20 SMITH 1 5
20 ADAMS 2 5
20 FORD 3 5
20 SCOTT 4 5
20 JONES 5 5
30 ALLEN 1 6
30 BLAKE 2 6
30 MARTIN 3 6
30 JAMES 4 6
30 TURNER 5 6
30 WARD 6 6
14 rows selected
The RNUM column creates an artificial hierarchy within department which allows us to identify the parent using rnum – 1. TOT tells us the total number of employees for the current department. We use this to identify the employees who are at the bottom of the hierarchy (where rnum=tot).

Complete Solution

So, putting all this together, we get the following SQL.
SELECT 
deptno,
LTRIM(SYS_CONNECT_BY_PATH(ename, ','), ',') employees
FROM (
SELECT deptno,
ename,
row_number() over(PARTITION BY deptno ORDER BY deptno) rnum,
count(*) over(PARTITION BY deptno) tot
FROM emp
)
WHERE rnum=tot
START WITH rnum = 1
CONNECT BY PRIOR rnum = rnum -1 AND PRIOR deptno = deptno;
DEPTNO       EMPLOYEES
------------ ------------------------------------
10 CLARK,KING,MILLER
20 SMITH,ADAMS,FORD,SCOTT,JONES
30 ALLEN,BLAKE,MARTIN,JAMES,TURNER,WARD
3 rows selected
We create an arbitrary hierarchy of employees within department and select only those employees at the bottom of the hierarchy. Then we use SYS_CONNECT_BY_PATH with a comma delimiter to format the employee names as a CSV list. LTRIM removes the first comma from our list.

Clearly, this is not the most intuitive SQL statement ever written. However, we could use this statement to create a view which would simplify access to the data. Alternatively, we could write a bespoke pipelined table function to provide a programmatic solution.

Oracle Quick Tip: Querying Buffer Cache Usage

This SQL statement shows which objects in our schema are using the most blocks in the data buffer cache.
SELECT object_name, object_type type, COUNT(*) blocks
FROM v$bh
JOIN user_objects ON object_id = objd
GROUP BY object_name, object_type
ORDER BY blocks DESC;
OBJECT_NAME                   TYPE     BLOCKS 
----------------------------- -------- ------
MGMT_METRICS_RAW_PK INDEX 1960
MGMT_METRICS_1HOUR_PK INDEX 1183
MGMT_METRICS_1DAY_PK INDEX 349
MGMT_METRICS TABLE 149
MGMT_SYSTEM_PERF_LOG_IDX_01 INDEX 62
MGMT_CURRENT_METRICS_PK INDEX 29
MGMT_SYSTEM_PERFORMANCE_LOG TABLE 21
MGMT_TARGETS TABLE 14
MGMT_METRIC_COLLECTIONS TABLE 14
MGMT_JOB_EMD_STATUS_QUEUE TABLE 14
To display all objects, regardless of schema, use DBA_OBJECTS instead of USER_OBJECTS.

Oracle Table Functions

An Oracle table function returns a collection that can be queried as if it were a table.

For example.
SELECT * FROM TABLE(my_function);
The TABLE keyword informs Oracle that the collection produced by my_function should be treated as a table in this query.

Example

To demonstrate this, let's create a simple function which generates a list of dates for a specified range, which could be used as follows.
SELECT column_value gen_date
FROM TABLE (date_list('1-jan-08', '5-jan-08'));
gen_date                   
----------------
01-jan-08
02-jan-08
03-jan-08
04-jan-08
05-jan-08
5 rows selected
Incidentally, this article provides a simple solution for generating rows, including dates in a range, using the CONNECT BY LEVEL clause.

The DATE_LIST function

First, we create a collection to be used as the return type from the function. Here we use a nested table but a VARRAY would also be valid.
CREATE TYPE interval_tab
AS TABLE OF date;
/
Type created.
Now we can go ahead and create the date_list function.
CREATE OR REPLACE FUNCTION date_list(p_start_date IN DATE, 
p_end_date IN DATE)
RETURN interval_tab AS
l_interval interval_tab := interval_tab();
l_days binary_integer := p_end_date - p_start_date + 1;
BEGIN
IF (l_days < 0) THEN
RETURN null;
END IF;
l_interval.extend(l_days);
FOR i IN 1 .. l_days
LOOP
l_interval(i) := p_start_date + i -1;
END LOOP;
RETURN l_interval;
END;
/
Function created.
This function populates a nested table by adding an incrementing value to the start date.

Using the table function

One use for this function would be to generate a list of dates to be used in an outer join. For example, the following query displays the number of schema objects created on each day of a specified date range.
SELECT to_char(column_value, 'dd-Mon-yy') created, nvl(cnt, 0) cnt
FROM TABLE(date_list('1-feb-08', '5-feb-08'))
LEFT OUTER JOIN
(SELECT TRUNC(created, 'dd') created, COUNT(*) cnt
FROM all_objects
GROUP BY TRUNC(created, 'dd'))
ON created = column_value
ORDER BY column_value;
CREATED        CNT                    
--------- --------
01-Feb-08 4
02-Feb-08 0
03-Feb-08 0
04-Feb-08 1
05-Feb-08 2
5 rows selected
Using the date_list function and an outer join we can display a row per day, including those days that have no entries in the ALL_OBJECTS table.

Oracle table functions can also be pipelined which allows the data to be returned iteratively as it is produced, rather than all at once when the function has run to completion.