Saturday, 1 March 2008

How to Generate Rows in Oracle


Here is a a simple and efficient way to generate arbitrary rows in Oracle.
SELECT <columns>
FROM dual
CONNECT BY LEVEL <= n;
where n is the number of rows you wish to produce.

For example.
SELECT rownum 
FROM dual
CONNECT BY LEVEL <= 5;
    ROWNUM
----------
1
2
3
4
5
5 rows selected

Test Data

One use for this technique is to quickly generate the contents of a table. The following example uses a CTAS statement to create a table and populate it with test data.

CREATE TABLE test_data(
id PRIMARY KEY,
group_id NOT NULL,
created_at NOT NULL,
text NOT NULL) AS
SELECT rownum,
MOD(rownum, 5),
TO_DATE('1-jan-07', 'dd-mon-yy') + INTERVAL '1' MINUTE * rownum,
CAST ('xyz' || rownum AS VARCHAR2(50))
FROM dual
CONNECT BY LEVEL <= 10000;
Table Created.

SELECT * 
FROM test_data
WHERE id <= 10;
        ID   GROUP_ID  CREATED_AT          TEXT
---------- ---------- ------------------- -----------
1 1 01-jan-07 00:01:00 xyz1
2 2 01-jan-07 00:02:00 xyz2
3 3 01-jan-07 00:03:00 xyz3
4 4 01-jan-07 00:04:00 xyz4
5 0 01-jan-07 00:05:00 xyz5
6 1 01-jan-07 00:06:00 xyz6
7 2 01-jan-07 00:07:00 xyz7
8 3 01-jan-07 00:08:00 xyz8
9 4 01-jan-07 00:09:00 xyz9
10 0 01-jan-07 00:10:00 xyz10
10 rows selected.

We now have ten thousand rows of test data with a sequential number for the primary key, a group item which contains five distinct values, a date field which increments by one minute for each row and a text field containing unique values.

Sparse Data

Another way we can utilise row generation is when dealing with sparse data.

Suppose we wish to use the data from the previous example to show the total number of rows by day.

SELECT TRUNC(created_at, 'dd') created_on, count(*)
FROM test_data
GROUP BY TRUNC(created_at, 'dd');
CREATED_ON   COUNT(*)
---------- ----------
01-jan-07 1439
02-jan-07 1440
03-jan-07 1440
04-jan-07 1440
05-jan-07 1440
06-jan-07 1440
07-jan-07 1361
7 rows selected.

However, let us assume that we must display a row for every day within a specified date range, even if the row count is zero. This is often the case if the data is used to produce a graph, for example.

It just so happens that our query produced a row per day between the 1st and 7th of January, but suppose the required date range is the 1st to the 10th of January and our table has a couple of days with no data. We will delete the rows for the 3rd and 5th of January to simulate this.

DELETE FROM test_data 
WHERE TRUNC(created_at, 'dd') IN (TO_DATE('3-jan-2007', 'dd-mon-yyyy'),
TO_DATE('5-jan-2007', 'dd-mon-yyyy'));
2880 rows deleted.

SELECT TRUNC(created_at, 'dd') created_on, count(*)
FROM test_data
GROUP BY TRUNC(created_at, 'dd');
CREATED_ON   COUNT(*)
---------- ---------
01-jan-07 1439
02-jan-07 1440
04-jan-07 1440
06-jan-07 1440
07-jan-07 1361
5 rows selected.

Because we are dealing with sparse data, row counts for only five days are returned by our query. However, the requirement is to produce a row for each of the ten days in our range. We will use the CONNECT BY LEVEL clause, inline views and an outer join to provide the solution.

First, we construct a query to return a row for each day in a specified range.

SELECT TO_DATE('1-jan-2007', 'dd-mon-yyyy') + (rownum - 1) created_on
FROM dual
CONNECT BY LEVEL <= TO_DATE('10-jan-2007', 'dd-mon-yyyy') -
TO_DATE('1-jan-2007', 'dd-mon-yyyy') + 1;
CREATED_ON
----------
01-jan-07
02-jan-07
03-jan-07
04-jan-07
05-jan-07
06-jan-07
07-jan-07
08-jan-07
09-jan-07
10-jan-07
10 rows selected.
This query is then joined with an aggregate query on the test data using an outer join.
SELECT created_on,
NVL(qty, 0) qty
FROM
(SELECT TO_DATE('1-jan-2007', 'dd-mon-yyyy') + (rownum - 1) created_on
FROM dual
CONNECT BY LEVEL <= TO_DATE('10-jan-2007', 'dd-mon-yyyy') -
TO_DATE('1-jan-2007', 'dd-mon-yyyy') + 1)
LEFT OUTER JOIN
(SELECT TRUNC(created_at, 'dd') created_on,
COUNT(*) qty
FROM test_data
GROUP BY TRUNC(created_at, 'dd'))
USING(created_on)
ORDER BY created_on;
CREATED_ON        QTY
---------- ----------
01-jan-07 1439
02-jan-07 1440
03-jan-07 0
04-jan-07 1440
05-jan-07 0
06-jan-07 1440
07-jan-07 1361
08-jan-07 0
09-jan-07 0
10-jan-07 0
10 rows selected.

Now we are guaranteed to return a row for every day in our range, even for the days with no data.

6 comments:

Anonymous said...

Thank you very much!!!

Mark Lishman said...

You're welcome

o r said...

Thanks a lot for CONNECT BY LEVEL!

Anonymous said...

Very Interesting!
Thank You!

Anonymous said...

Hi,

I have a inquiry for the webmaster/admin here at blog.lishman.com.

Can I use some of the information from your post above if I provide a link back to your website?

Thanks,
Harry

gachon said...

hi,

thank you for this article.
I am trying to use it but I need to split for each minute for the last hours

how can I build the query extract all minutes from dual ?

thank you
Gachon