KayeILSQL-08-Subqueries

Contents

Based On

Subqueries in General

The Subquery Structure

The Notation We Will Use

Subquery Operators

SINGLE-ROW Subqueries

MULTIPLE-ROW Subqueries

NULL Values in a Subquery

FROM Clause Query / INLINE VIEW

TOP-N Query

CORRELATED Subqueries

References

Based On

Based On — Experience

The contents of this work are based on;

Based On — European Parliament as Data Model

This text on SQL is based on the EP-European Parliament as Data Model.

The European Parliament (EP) is the parliamentary institution of the European Union (EU).

The European Parliament is elected by the citizens of the European Union to represent their interests.

Its origins go back to the 1950s and the founding Treaties. Since 1979 its members have been directly elected by the citizens of the EU.

Elections are held every five years, and every EU citizen is entitled to vote, and to stand as a candidate, wherever they live in the EU. Parliament thus expresses the democratic will of the European Union's nearly 500 million citizens and it represents their interests in discussions with the other EU institutions.

Please Note:

The sources for the data, which are all freely obtainable from the European Union website, and from several leaflets of the EU institutions, are listed in the References section.

These sources are used to create the fictitious Business Narrative of the "Data Modelling and Normalization" Chapter.

We called this Business Narrative; "the Story of the European Union" which explains about the EU and EP, for purposes of Data Modeling.

The Story of the European Union is the last Chapter of Kaye is Learning SQL.

Based On — Oracle Database as RDBMS

This text on SQL is based on the Oracle Database as RDBMS — Relational Database Management System.

Although the SQL Statements used in this text, apply specifically to the Oracle Database implementation of SQL, they are generally compatible with many other RDBMS SQL implementations.

The knowledge in the following sections are based on my personal work experience, as well as on several Oracle Documentation, which are freely downloadable from Oracle websites.

You can find a list of these documents in the References.

The name of the documents may change in time.

You can refer to the Oracle Documentation if you need more detailed information on any of the topics in this text.

Any errors present, are mine.

Breath of Fresh Air

Subqueries in General

A Query is an operation that retrieves data from one or more tables or views.

We call a Query NESTED within another SQL Statement a Subquery.

Report Requirements

We want to write a report that displays those countries which joined the EU later than Sweden.

Solution with Two Queries

We initially have to find out Sweden's join date.

We write the following query.

This is our first query.

First Query
select join_date
from countries
where country_id='SE';         
    

Here is the result.

gen_01.jpg

Sweden's join date is '01-JAN-95'.

We then write a second query, which retrieves countries with JOIN_DATE > '01-JAN-95'.

Second Query
select country_id,		
country_name,		
join_date,		
gdp_pps,		
gdp 
from countries
where status='M' 
and join_date > '01-JAN-95'
order by join_date, country_id;       
    

Here is the result.

Here are the countries which joined the EU later than Sweden.

gen_02.jpg

We wrote the report required from us, in two steps, using two consecutive queries.

First Query In the First Query, we retrieved Sweden's join date: '01-JAN-95'

Second Query In the Second Query, we used the Output of the First Query, in the Comparison Operator > (join_date > '01-JAN-95') to complete the report.

However, we can solve this problem by using Only One Query.

Solution with a Subquery

We can solve the problem by writing Only One Query.

This is where the Subquery comes in.

Here is the Query We Use:
select 	country_id,
country_name,
join_date,
gdp_pps,
gdp 
from countries
where status='M' 
and join_date > (select	join_date
                from countries
                where country_id='SE')
order by join_date, country_id;    

We have exactly the same result as before.

gen_03.jpg

Let's examine this query.

The Subquery Structure

We will now examine the structure of a Subquery, based on the Solution of the Report above.

select 	country_id,
country_name,
join_date,
gdp_pps,
gdp 
from countries
where status='M' 
and join_date > (select join_date
                    from countries
                    where country_id='SE')
order by join_date, country_id;    

The script in black is the TOP-LEVEL QUERY.

The TOP-LEVEL QUERY can also be called; the MAIN QUERY / OUTER QUERY / PARENT QUERY / PARENT STATEMENT.

The script (in red and within brackets) is the SUBQUERY.

The SUBQUERY is nested within the PARENT STATEMENT.

A Query Nested within another SQL Statement is called a SUBQUERY.

A SUBQUERY can also be called an INNER QUERY.

The PARENT STATEMENT of a SUBQUERY is NOT ALWAYS a SELECT Statement.

The PARENT STATEMENT (TOP-LEVEL SQL Statement) in which the SUBQUERY is NESTED, can ALSO be an INSERT, UPDATE or a DELETE Statement.

A SUBQUERY can contain ANOTHER SUBQUERY.

You can nest up to 255 levels of Subqueries in a Nested Subquery.

For the limit on the number of Nested Subqueries within Subqueries, you can check the latest Oracle Documentation.

In any Subquery, OTHER THAN a Correlated Subquery, the INNER Query (the SUBQUERY);

The order of execution is like two sequential queries:

A Subquery can be executed in a:

The Notation We Will Use

There are many alternative terms for the Top-Level Query and the Subquery.

A Top-Level Query can also be called a Main Query / Outer Query / Parent Query / Parent Statement.

A Subquery can also be called a Child Query / Inner Query.

In this Chapter we will mostly use the following terms:

Subquery Operators

Below is the Basic Syntax of a Subquery.

Please keep in mind that there are other syntaxes.

This syntax helps us to understand the subquery operators.

SELECT select_list
FROM table_name
WHERE expression subquery operator 
        (SELECT select_list
        FROM	table_name)

The subquery operator can be;

Single-Row Operator

These are the Single-Row Operators:

Single-Row Operators are used with Single-Row Subqueries.

A Subquery which returns Only One Row from the Inner Select Statement is a Single-Row Subquery.

Multiple-Row Operator

These are the Multiple-Row Operators:

Multiple-Row Operators are used with Multiple-Row Subqueries.

A Subquery which returns More Than One Row from the Inner Select Statement is a Multiple-Row Subquery.

SINGLE-ROW Subqueries

SINGLE-ROW Subquery Example 1

Retrieve those countries that joined the EU On The Same Date as the country with id 'SE'.

select 	country_id,
country_name,
capital,
join_date
from countries
where status='M' 
and join_date = (select join_date
                from countries
                where country_id='SE')
order by country_id;  

Here is the output of this SELECT Statement.

sr_01.jpg

SINGLE-ROW Subquery Example 2

Retrieve those countries that joined the EU Before The Joining Date of the country with id 'SE'.

select 	country_id,
country_name,
capital,
join_date
from countries
where status='M' 
and join_date < (select join_date
                from countries
                where country_id='SE')
order by join_date, country_id;   

Here is the output of this SELECT Statement.

sr_02.jpg

SINGLE-ROW Subquery Example 3

Retrieve the country which has the Minimum Purchasing Power Standard among the EU Member Countries.

select country_id,
country_name,
capital,
gdp_pps,
gdp,
population,
join_date
from countries
where	status='M' 
and	gdp_pps = 
    (select min(gdp_pps)
    from countries
    where status='M');  

Here is the output of this SELECT Statement.

An Aggregate Function (Group Function) is used in the subquery to select the minimum purchasing power standard MIN(GDP_PPS) among the member countries.

sr_03.jpg

SINGLE-ROW Subquery Example 4

Retrieve the details of those countries which joined the EU before Sweden(SE), and which have a purchasing power standard greater than the United Kingdom(UK).

select 	country_id,
country_name,
capital,
join_date,
gdp_pps
from 	countries
where	status='M' 
and	join_date < 
        (select join_date
        from 	countries
        where	country_id='SE')
and	gdp_pps >
        (select gdp_pps
        from	countries
        where	country_id='UK')
order by gdp_pps desc;  

This SELECT Statement uses two subqueries:

  1. to retrieve the details of those countries which joined the EU before Sweden(SE),
  2. AND
  3. to retrieve the details of those countries which have a purchasing power standard greater than the United Kingdom(UK).

sr_04.jpg

SINGLE-ROW Subquery Example 5

In our database, in the COUNTRIES table, there are three distinct values for the column STATUS.

These three distinct values of the STATUS column are:

  1. M for the Member Countries
  2. C for the Candidate Countries
  3. O for the Others

We can say that we have three categories of STATUS.

We want retrieve the minimum gdp_pps for different categories of STATUS, IF these minimum values of gdp_pps are GREATER THAN the minimum gdp_pps of countries with a status of 'C'.

select	status,	
min(gdp_pps)
from countries	
group by status
having	min(gdp_pps) >
    (select min(gdp_pps)
    from countries
    where	status='C');

Here is the output of the query.

The INNER Query returns the "minimum gdp_pps MIN(GDP_PPS) of Candidate countries".

HAVING Clause eliminates those rows in the MAIN Query, that have minimum gdp_pps values less than or equal to the value returned by the INNER Query.

sr_05.jpg

SINGLE-ROW Subquery Example 6

The SELECT statement below wants to retrieve those countries which joined the EU on the same date as the country 'SWEDEN'.

select	country_id,
country_name,
gdp_pps,
join_date
from	countries
where	join_date =
        (select	join_date
        from	countries
        where	country_name='SWEDEN');   

Here is the output of the SELECT statement.

We can see that NO Rows are retrieved.

sr_06.jpg

Below is the Subquery.

select	join_date
from	countries
where	country_name='SWEDEN'

We can see below that the Subquery Returns NO Rows.

This is because, the country name is stored as 'Sweden' which is INITCAP, and NOT in UPPERCASE.

The value of the JOIN_DATE retuened is NULL.

sr_06_2.jpg

The MAIN Query compares NULL with the values of the JOIN_DATE column in the whole table, using the EQUAL TO = operator.

where join_date =
    (select join_date
    from countries
    where country_name='SWEDEN');      

In fact, there is more than one country which have NULL values in their JOIN_DATE columns. These are the countries with status of 'C' and 'O'.

Comparison of two NULLS using the "= Operator" (and also using the other Comparison Operators) yields a NULL.

Therefore, the condition above will yield a NULL.

A row will ONLY be RETURNED if the condition yields a TRUE, therefore NO ROWS are returned.

SINGLE-ROW Subquery Example 7

Within the set of countries when grouped by STATUS (M=Member, C=Candidate, O=Others), we want to display ONLY the sets with the highest average gdp_pps.

We want the report to display the "STATUS", "minimum gdp_pps", "average gdp_pps", and the "maximum gdp_pps" of such a set.

The query below, with a HAVING Clause retrieves the required information.

select	status,
MIN(GDP_PPS),  
AVG(gdp_pps), 
MAX(GDP_PPS)
from countries
group by status
having AVG(gdp_pps) = 
            (select max(avg(gdp_pps))
            from	countries
            group by status)
order by status  

We can see the output of this query below.

sr_07.jpg

Below is another query giving the same result.

select	status,
MIN(GDP_PPS),  
AVG(gdp_pps), 
MAX(GDP_PPS)
from countries
group by status
having AVG(gdp_pps) = 
            (select max(avgdp) from
                (select status, 
                avg(gdp_pps) avgdp
                from countries
                group by status)
            )
order by status  

MULTIPLE-ROW Subqueries

MULTIPLE-ROW Subqueries return MORE THAN ONE Row.

MULTIPLE-ROW Operators:

MULTIPLE-ROW IN

We want to retrieve those countries which have the HIGHEST GDP(Gross Domestic Product) within each STATUS ('M', 'C', 'O') Group.

We write the following query.

SELECT	COUNTRY_ID,
COUNTRY_NAME,	
CASE	STATUS
    WHEN 'M' THEN 'Member'
    WHEN 'C' THEN 'Candidate'
    WHEN 'O' THEN 'Other'
END STATUS_CLASS,
GDP
FROM	COUNTRIES
WHERE	GDP =
    (SELECT	MAX(GDP)
    FROM	COUNTRIES
    GROUP BY STATUS)
ORDER BY GDP DESC;

Here is the output.

mr_01.jpg

We receive the following error:

ORA-01427: single-row subquery returns more than one row
01427. 00000 - "single-row subquery returns more than one row"

We receive this error, because the Subquery returns multiple rows, whereas the "=" is a Single-Row Operator and EXPECTS only ONE ROW to be returned from the Subquery.

Here is the Subquery on its own.

We can see that it returned three rows, one per Status Group.

mr_03.jpg

Our mistake is that, we are using a Single-Row operator (=) for the Subquery.

. . WHERE GDP = (SELECT MAX(GDP) FROM COUNTRIES GROUP BY STATUS)

If we alter the SELECT statement as below and use a Multiple-Row Operator instead, we do get the result set we want.

SELECT	COUNTRY_ID,
COUNTRY_NAME,	
CASE	STATUS
    WHEN 'M' THEN 'Member'
    WHEN 'C' THEN 'Candidate'
    WHEN 'O' THEN 'Other'
END STATUS_CLASS,
GDP
FROM COUNTRIES
WHERE GDP IN
    (SELECT	MAX(GDP)
    FROM	COUNTRIES
    GROUP BY STATUS)
ORDER BY GDP DESC;

Countries with the HIGHEST GDP(Gross Domestic Product) within each STATUS ('M', 'C', 'O') Group are displayed.

mr_02.jpg

The Main Query appears to the server as the following.

SELECT	COUNTRY_ID,
COUNTRY_NAME,	
CASE	STATUS
    WHEN 'M' THEN 'Member'
    WHEN 'C' THEN 'Candidate'
    WHEN 'O' THEN 'Other'
END STATUS_CLASS,
GDP
FROM	COUNTRIES
WHERE	GDP IN
    (10509000000000, 2322000000000, 319000000000)
ORDER BY GDP DESC;

beautiful_sql_04.jpg

MULTIPLE-ROW ANY

ANY Operator and SOME Operator are SYNONYMS.

ANY compares a Value to EACH Value returned by a Subquery.

The Condition must hold for AT LEAST ONE Value in the Result Set of the Subquery.

= ANY is equivalent to IN.

< ANY is equivalent to LESS THAN the MAXIMUM.

> ANY is equivalent to GREATER THAN the MINIMUM.

Report Requirement

We are asked to create a report of those countries which are Members, and which have GDPs LESS THAN ANY of the Candidate countries. The result set will be in descending GDP values.

SELECT	COUNTRY_ID,
COUNTRY_NAME,	
GDP
FROM	COUNTRIES
WHERE	GDP < ANY
    (SELECT	GDP
    FROM	COUNTRIES
    WHERE 	STATUS='C')
AND	STATUS='M'
ORDER BY GDP DESC;

Here is the Result Set of the Query.

mr_any_01.jpg

Below is the INNER Query / SUBQUERY.

SELECT	GDP
FROM	COUNTRIES
WHERE 	STATUS='C'

This is what the INNER Query / SUBQUERY returns.

mr_any_02.jpg

< ANY is equivalent to LESS THAN the MAXIMUM.

The MAXIMUM value in the Result Set of the SUBQUERY is 319000000000.

Therefore, the MAIN QUERY appears to the Server as the following.

SELECT	COUNTRY_ID,
COUNTRY_NAME,	
GDP
FROM	COUNTRIES
WHERE	GDP   <  319000000000
AND	STATUS='M'
ORDER BY GDP DESC;

These are the rows of Member Countries, whose GDPs are LESS THAN THE MAXIMUM GDP in the RESULT SET OF THE INNER QUERY.

mr_any_03.jpg

We achieved this result using the < ANY Operator:

            

SELECT COUNTRY_ID, COUNTRY_NAME, GDP FROM COUNTRIES WHERE GDP < ANY (SELECT GDP FROM COUNTRIES WHERE STATUS='C') AND STATUS='M' ORDER BY GDP DESC;

MULTIPLE-ROW ALL

ALL compares a Value to EVERY Value returned by a Subquery.

The Condition must hold for ALL Values in the Result Set of the Subquery.

!= ALL is equivalent to NOT IN.

< ALL is equivalent to LESS THAN the MINIMUM.

> ALL is equivalent to GREATER THAN the MAXIMUM.

Report Requirement

We are asked to create a report of those countries which are Members, and which have GDPs LESS THAN ALL of the Candidate countries. The result set will be in descending GDP values.

SELECT COUNTRY_ID,
COUNTRY_NAME,	
GDP
FROM COUNTRIES
WHERE GDP < ALL
    (SELECT GDP
    FROM	COUNTRIES
    WHERE 	STATUS='C')
AND STATUS='M'
ORDER BY GDP DESC;

Here is the Result Set of the Query.

Zero rows returned, i.e. NO ROWS are returned.

mr_all_01.jpg

Below is the INNER Query / SUBQUERY.

ORDER BY is for displaying the data more meaningfully. It is not in the actual subquery.

SELECT GDP
FROM COUNTRIES
WHERE STATUS='C'
ORDER BY GDP

This is what the INNER Query / SUBQUERY returns.

mr_all_02.jpg

< ALL is equivalent to LESS THAN the MINIMUM.

The MINIMUM value in the Result Set of the SUBQUERY is 5000000000.

Therefore, the MAIN QUERY appears to the Server as the following.

SELECT COUNTRY_ID,
COUNTRY_NAME,	
GDP
FROM COUNTRIES
WHERE GDP < 5000000000
AND STATUS='M'
ORDER BY GDP DESC;

The query returns NO ROWS.

This is because, there is NO Member Country with a GDP which is LESS THAN the MINIMUM GDP in the Result Set of the INNER QUERY.

mr_all_03.jpg

We achieved this result using the < ALL Operator:

            

SELECT COUNTRY_ID, COUNTRY_NAME, GDP FROM COUNTRIES WHERE GDP < ALL (SELECT GDP FROM COUNTRIES WHERE STATUS='C') AND STATUS='M' ORDER BY GDP DESC;

NULL Values in a Subquery

Report Requirements

We want to find out those Political Groups that DO NOT HAVE any Members of the European Parliament.

The Query for the Report Requirement

As an answer to the Report Requirement, we create and run the following query.

select pg_id,
pg_code,
pg_name
from political_groups
where pg_id NOT IN 
    (select distinct pg_id
    from meps);

Result Set of the Query

This is the Result Set of the query.

The query returns NO ROWS !?

nv_01.jpg

We now run the query above which returned NO ROWS, as two separate Queries:

The Outer Query

Here is the Outer Query WITHOUT the WHERE Clause for the INNER QUERY.

ORDER BY Clause is added for display purposes only.

select pg_id,
pg_code,
pg_name
from political_groups
order by pg_id;

The Outer Query returned 9 rows.

These are the 9 Political Groups of the European Parliament.

They MIGHT NOT all have Members of the Parliament.

nv_02.jpg

The Inner Query

We now run the INNER QUERY on its own.

We include the ORDER BY Clause for display purposes only.

select distinct pg_id
from meps
order by pg_id;

Here is the output of the INNER QUERY.

These are the Political Groups of the European Parliament, that HAVE Members of the Parliament.

The last row with a NULL Value indicates the Non-Attached Members of the European Parliament.

As we know from our Data Model, the Non-Attached Members of the Parliament have a NULL PG_ID.

nv_03.jpg

The Result We Expect

The result set of the Outer Query is: (20, 40, 60, 80, 100, 140, 160, 200901, 200902)

The result set of the Inner Query is: (20, 40, 60, 80, 100, 140, 160, NULL)

Looking at the Result Sets of the Outer Query and the Inner Query, the logical result we expect to see is the following two political groups: (200901, 200902)

PG_ID PG_CODE PG_NAME
200901 NEW-EN Group of the New Energy
200902 IE-IT Group of the Innovation Economy and Information Technology

Obviously, these are the two Political Groups that DO NOT HAVE any Members.

But, why don't they appear in the Result Set of Our Query?

Why is the Result Set of Our Query seen below, has No Rows?

select pg_id,
pg_code,
pg_name
from political_groups
where pg_id NOT IN 
    (select distinct pg_id
    from meps);
    

Some Questions on a Query Returning Rows

Question: When does a Query Return any Rows?

Answer: A Query Returns Rows, when the Logical Condition/Conditions in the WHERE Clause Evaluate to TRUE.

Question: What can the Logical Condition/Conditions in the WHERE Clause Evaluate to?

Answer: The Logical Condition/Conditions in a WHERE Clause can evaluate to one of the below:

Question: What does the Condition in our query seen below, Evaluate to?

where pg_id NOT IN 
    (select distinct pg_id
    from meps);

Answer: This is a lengthy answer.

We will see how to proceed, in the next section.

Our Particular Case

The query in our example is seen below:

select pg_id,
pg_code,
pg_name
from political_groups 
where pg_id NOT IN 
    (select distinct pg_id
    from meps);    

During query execution, the WHERE Clause Condition becomes:

(20, 40, 60, 80, 100, 140, 160, 200901, 200902)
NOT IN
(20, 40, 60, 80, 100, 140, 160, NULL)

Each value in the First Set (Result Set of the OUTER Query) is compared with each value in the Second Set (Result Set of the INNER Query).

There are Nine 9 Values in the First Set, therefore there will be Nine 9 Comparison Conditions.

Any Comparison Condition which yields TRUE, will return that value from the First Set.

REMINDER

WHERE variable_01 NOT IN (variable_02, variable_03)

IS IDENTICAL TO

WHERE (variable_01 != variable_02) AND (variable_01 != variable_03)

First Comparison

FIRST CONDITION

The value 20 in the First Set is compared with each of the values in the Second Set.

20!=20 AND 20!=40 AND 20!=60 AND 20!=80 AND 20!=100 AND 20!=140 AND 20!=160 AND 20!=NULL

which evaluates to;

FALSE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND NULL

which evaluates to;

FALSE

Therefore, the First Condition does NOT RETURN any rows.

Hence, the row with pg_id = 20 is NOT SELECTED.

Second Comparison

SECOND CONDITION

The value 40 in the First Set is compared with each of the values in the Second Set.

40!=20 AND 40!=40 AND 40!=60 AND 40!=80 AND 40!=100 AND 40!=140 AND 40!=160 AND 40!=NULL

which evaluates to;

TRUE AND FALSE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND NULL

which evaluates to;

FALSE

Therefore, the Second Condition does NOT RETURN any rows.

Hence, the row with pg_id = 40 is NOT SELECTED.

Third to Seventh Comparisons

Likewise for THIRD to SEVENTH CONDITIONS

Similarly, all of the Third to Seventh Conditions evaluate to FALSE.

Hence, they DO NOT RETURN any rows either.

60, 80, 100, 140 and 160 are NOT SELECTED

Eighth Comparison

EIGHTH CONDITION

The value 200901 in the First Set is compared with each of the values in the Second Set.

200901!=20 AND 200901!=40 AND 200901!=60 AND 200901!=80 AND 200901!=100 AND 200901!=140 AND 200901!=160 AND 200901!=NULL

which evaluates to;

TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND NULL

which evaluates to;

NULL

Therefore, the Eighth Condition does NOT RETURN any rows.

Hence, the row with pg_id = 200901 is NOT SELECTED.

Ninth Comparison

NINTH CONDITION

The value 200902 in the First Set is compared with each of the values in the Second Set.

200902!=20 AND 200902!=40 AND 200902!=60 AND 200902!=80 AND 200902!=100 AND 200902!=140 AND 200902!=160 AND 200902!=NULL

which evaluates to;

TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND TRUE AND NULL

which evaluates to;

NULL

Therefore, the Ninth Condition does NOT RETURN any rows.

Hence, the row with pg_id = 200902 is NOT SELECTED.

NONE of the NINE Comparison Conditions returned TRUE .

Therefore, NO ROWS are RETURNED.

Let's Get This Right

The query should be re-written as follows:

select pg_id,
pg_code,
pg_name
from political_groups
where	pg_id  NOT IN 
        (select	 distinct pg_id
        from meps
        where pg_id is NOT NULL);

We added the WHERE Condition seen below into the INNER QUERY.

This WHERE Condition makes sure that there will be only NON NULL values in the Result Set of the INNER QUERY.

where pg_id is NOT NULL

We can now see the Political Groups that DO NOT HAVE any Members of the Parliament just as yet.

nv_04.jpg

"Success is not final;

Failure is not fatal.

It is the courage to continue that counts."

— Winston Churchill

churchill_01.png

FROM Clause Query / INLINE VIEW

We want to retrieve details of those countries which have a GDP MORE THAN the Average GDP of their Status Group.

Please remember that there are three Groups of Status in our Database:

In the Query which brings us the required rows, we use an INNER Query within the FROM Clause.

This INNER Query in the FROM Clause is treated like a "temporary table" during the lifetime of the query.

An INNER Query within the FROM Clause of a SELECT Statement is also called an INLINE VIEW.

Here is our Query:

SELECT C_ALL.COUNTRY_NAME,		
C_ALL.STATUS,	
C_ALL.GDP,
ROUND(C_AVG_GDP.AVG_GDP) "Average GDP of Status Group"
FROM COUNTRIES C_ALL, 	
        (SELECT	STATUS,	
        AVG(GDP) AVG_GDP
        FROM COUNTRIES
        GROUP BY STATUS) C_AVG_GDP
WHERE C_ALL.STATUS = C_AVG_GDP.STATUS
AND C_ALL.GDP > C_AVG_GDP.AVG_GDP
ORDER BY C_ALL.STATUS, C_ALL.GDP DESC;

Here is the Result Set of the query:

inline_01.jpg

TOP-N Query

TOP-N Query Using the ROWNUM Pseudocolumn

Problem

We want to retrieve those Member Countries which have the TOP-5 Purchasing Power Standard.

To be able to get the required results, we make use of the pseudocolumn ROWNUM.

What is ROWNUM?

For each row returned by a query, the ROWNUM pseudocolumn returns "a number indicating the ORDER" in which the row is selected from a table or from a set of joined rows.

The First Row selected has a ROWNUM of 1 , the Second Row selected has a ROWNUM of 2, and so on.

ROWNUM can be used to LIMIT the number of rows returned by a query, as we will do in our example.

Step 1

The Initial Step is, to Order the Countries in Descending Order of their Purchasing Power Standard.

The column GDP_PPS holds the Purchasing Power Standard.

Here is our Step 1 Query:

SELECT COUNTRY_NAME,
GDP_PPS
FROM	COUNTRIES
WHERE	STATUS='M'
ORDER BY GDP_PPS DESC;

The Query of Step 1 will be used as the INNER Query in the FROM Clause of the MAIN Query in Step 2.

As we know from the previous section, Result Set of the INNER Query in the FROM Clause is treated like a "temporary table".

The Result Set of the INNER Query is seen below.

The rows will be selected EXACTLY in the order seen below from the INNER Query "temporary table".

tn_rownum_01.jpg

Step 2

The Step 2 Query (MAIN Query) seen below, uses the Query of Step 1 as an INNER Query in the FROM Clause (INLINE VIEW).

SELECT	ROWNUM "Rank",
COUNTRY_NAME "Top Five",
GDP_PPS "PPS"
FROM	(SELECT COUNTRY_NAME,
        GDP_PPS
        FROM	COUNTRIES
        WHERE	STATUS='M'
        ORDER BY GDP_PPS DESC)
WHERE	ROWNUM <=5;

Our Answer to the Problem is seen below.

We retrieved those Member Countries which have the TOP-5 Purchasing Power Standard.

tn_rownum_02.jpg

There is an ORDER BY Clause in the INNER Query:

ORDER BY GDP_PPS DESC

We placed the WHERE Condition with ROWNUM in the MAIN Query:

WHERE ROWNUM <=5

This condition LIMITED the number of rows to 5.

We forced the WHERE Condition with ROWNUM to be applied AFTER the rows are ORDERED in the INNER Query.

As a result, the query returned the FIRST FIVE Countries with the HIGHEST GDP_PPS, in DESCENDING order of GDP_PPS.

This type of query is referred to as a TOP-N Query.

In this section, we had an example of a TOP-N Query with the Pseudocolumn ROWNUM.

TOP-N Query Using the Analytic Function ROW_NUMBER

Problem

We want to retrieve those Countries which have the TOP-3 Purchasing Power Standard within their Status Group.

In our report, we want to display:

The Report Output we want to obtain is seen below.

row_number_02.jpg

To be able to get the required results, we make use of the Analytic Function ROW_NUMBER.

What is ROW_NUMBER?

The INNER QUERY with ROW_NUMBER

Here is our INNER QUERY:

SELECT STATUS,
COUNTRY_NAME,	
GDP_PPS,
ROW_NUMBER() OVER 
    (PARTITION BY STATUS ORDER BY GDP_PPS DESC) STATUS_RANK
FROM COUNTRIES
WHERE GDP_PPS IS NOT NULL

Below is the Result Set of the INNER Query.

row_number_03.jpg

The complete Result Set is seen below.

row_number_04.jpg

ROW_NUMBER() OVER (PARTITION BY .. ORDER BY ..)

We have the following expression in the INNER QUERY.

ROW_NUMBER() OVER 
    (PARTITION BY STATUS ORDER BY GDP_PPS DESC) STATUS_RANK

ROW_NUMBER() OVER (PARTITION BY STATUS ORDER BY GDP_PPS DESC) STATUS_RANK

Let's analyze this expression by each Clause.

PARTITION BY STATUS

This Clause DIVIDES the COUNTRIES table into SUBSETS by the values in the column STATUS.

The values in the column STATUS are 'C', 'M', and 'O'.

Hence we now have three SUBSETS of the COUNTRIES table.

ORDER BY GDP_PPS DESC

This Clause ORDERS the COUNTRIES in DESCENDING ORDER of their GDP_PPS .

Hence, these Two Clauses together;

(PARTITION BY STATUS ORDER BY GDP_PPS DESC)

ROW_NUMBER() OVER

This Clause ASSIGNS each COUNTRY a RANK WITHIN EACH SUBSET , which are already ordered in descending GDP_PPS.

ROW_NUMBER() OVER (PARTITION BY STATUS ORDER BY GDP_PPS DESC)

This whole expression above, created a New Column.

We now give this new column an ALIAS:

STATUS_RANK

This new column DOES NOT EXIST in the Countries table.

It ONLY EXISTS during the lifetime of the query.

Once again, we display our INNER QUERY and its Result Set below.

SELECT STATUS,
COUNTRY_NAME,	
GDP_PPS,
ROW_NUMBER() OVER 
(PARTITION BY STATUS ORDER BY GDP_PPS DESC) STATUS_RANK
FROM COUNTRIES
WHERE GDP_PPS IS NOT NULL

row_number_03.jpg row_number_04.jpg

Our INNER QUERY Result Set is ready to be used in the OUTER QUERY.

The OUTER QUERY

Using the query below, we find the Countries with the three highest GDP_PPS WITHIN each Status Group.

We placed the WHERE Condition with the New Column in the OUTER Query:

WHERE STATUS_RANK <=3

This condition which uses the New Column STATUS_RANK, LIMITS the number of rows within Each STATUS Group to 3 .

SELECT	 STATUS_RANK,
STATUS,
COUNTRY_NAME,
GDP_PPS
FROM (SELECT STATUS,
    COUNTRY_NAME,	
    GDP_PPS,
    ROW_NUMBER() OVER 
        (PARTITION BY STATUS ORDER BY GDP_PPS DESC) STATUS_RANK
    FROM	COUNTRIES
    WHERE 	GDP_PPS IS NOT NULL)
WHERE STATUS_RANK  <=3
ORDER BY STATUS, GDP_PPS DESC, COUNTRY_NAME;

Below is the Result Set of the OUTER Query.

row_number_01.jpg

In this section, we had an example of a TOP-N Query with the Analytic Function ROW_NUMBER.

CORRELATED Subqueries

What is a CORRELATED Subquery?

A Correlated Subquery is performed when the Subquery REFERENCES a column in a table in the MAIN Query (OUTER Query or the PARENT Statement).

The PARENT Statement can be a SELECT, INSERT, UPDATE or a DELETE Statement in which the Subquery is nested.

In any Subquery, OTHER THAN a Correlated Subquery, the INNER Query (the SUBQUERY);

A Correlated Subquery however, EXECUTES ONCE for EACH CANDIDATE ROW considered by the MAIN Query(OUTER Query/PARENT Query).

In other words, the Correlated Subquery (Inner Query ) is DRIVEN BY the MAIN Query(OUTER Query/PARENT Query).

MAIN Query(OUTER Query/PARENT Query) Retrieves a Candidate Row FIRST.

Let's have an Example of a Correlated Subquery.

Example of a CORRELATED Subquery

Find those Countries which have Purchasing Power Standard values MORE THAN the Average Purchasing Power Standard of their Status Group.

In this case, the Correlated Subquery specifically computes the Average Purchasing Power Standard of EACH Status Group (STATUS column).

However, the Value of the Status Group in the Correlated Subquery DEPENDS ON the Value of the Status Column in EACH ROW processed by the MAIN Query(OUTER Query/PARENT Query).

The following SELECT Statement returns Countries which have Purchasing Power Standard values MORE THAN their Status Group Average.

select	outer_c.country_id,
outer_c.country_name,
outer_c.status,
outer_c.gdp_pps
from countries outer_c
where outer_c.gdp_pps > 
    (select avg(inner_c.gdp_pps)
    from countries inner_c
    where inner_c.status = outer_c.status)
order by outer_c.status, 	
outer_c.gdp_pps desc;

Here is the Result Set of the Query.

corr_01.jpg

How Does a CORRELATED Subquery Work?

  1. A Candidate Row is retrieved from the Main Query.
  2. The Correlated Subquery(INNER Query) is executed using a Candidate Row value from the Main Query.
  3. The value retrieved by the Correlated Subquery is used to Qualify or Disqualify the Candidate Row.
  4. Go to Step 1 and process until all rows are read.

EXISTS Operator in CORRELATED Subqueries

EXISTS Operator is used to test whether a value retrieved by the MAIN Query exists in the Result Set of the INNER Query (Correlated Subquery).

(EXISTS) INNER Query Returns AT LEAST ONE ROW

If the INNER Query (Correlated Subquery) Returns AT LEAST ONE ROW:

(EXISTS) INNER Query Returns NO ROWS

If the INNER Query (Correlated Subquery) Returns NO ROWS:

Example ON EXISTS

Requirement:

Display details of those Political Groups that have at least one MEP (Member of the Parliament) belonging to them.

Answer:

To answer this requirement, we write the Query below.

select outer_q.pg_id,	
outer_q.pg_code,
outer_q.pg_name
from political_groups outer_q
where EXISTS (select 1
            from meps inner_q
            where inner_q.pg_id = outer_q.pg_id);

Here is the Result Set of the Query.

exists_01.jpg

EXISTS Operator ensures that the search in the INNER QUERY (Correlated Subquery) does NOT CONTINUE when AT LEAST ONE MATCH is found for the Political Group.

We can replace the EXISTS Operator with the IN Operator, and use the column pg_id instead of a CONSTANT in the INNER SELECT, and obtain the same result set, as seen below.

select outer_q.pg_id,	
outer_q.pg_code,
outer_q.pg_name
from political_groups outer_q
where outer_q.pg_id IN 
        (select	 distinct inner_q.pg_id
        from meps inner_q
        where inner_q.pg_id is not null);

Here is the output of the query.

exists_02.jpg

NOT EXISTS Operator in CORRELATED Subqueries

NOT EXISTS Operator is used to test whether a value retrieved by the MAIN Query DOES NOT EXIST in the Result Set of the INNER Query (Correlated Subquery).

(NOT EXISTS) INNER Query Returns AT LEAST ONE ROW

If the INNER Query (Correlated Subquery) Returns AT LEAST ONE ROW:

(NOT EXISTS) INNER Query Returns NO ROWS

If the INNER Query (Correlated Subquery) Returns NO ROWS:

Example ON NOT EXISTS

Requirement:

Display details of those Political Groups that have currently NO MEPs (Members of the Parliament) belonging to them.

Answer:

To answer this requirement, we write the Query below.

select	outer_q.pg_id,	
outer_q.pg_code,
outer_q.pg_name
from political_groups outer_q
where	NOT EXISTS (select 1
            from meps inner_q
            where inner_q.pg_id = outer_q.pg_id);

Here is the Result Set of the Query(NOT EXISTS).

not_exists_01.jpg

We can replace the NOT EXISTS Operator with the NOT IN Operator, and use the column pg_id instead of a CONSTANT in the INNER SELECT, and obtain the same result set, as seen below.

select outer_q.pg_id,	
outer_q.pg_code,
outer_q.pg_name
from political_groups outer_q
where outer_q.pg_id NOT IN 
        (select	 distinct inner_q.pg_id
        from meps inner_q
        where inner_q.pg_id is not null);

Here is the Result Set of the Query (NOT IN).

not_exists_02.jpg

This resumes the Chapter on Subqueries.

Breath of Fresh Air

References

just me