KayeILSQL-07-Aggregate Functions

Contents

Based On

About Aggregate Functions

USING the GROUP BY Clause

COUNT() Group Function

HAVING Clause

FIRST and LAST

References

Based On

Based On — Experience

The contents of this work are based on;

Based On — European Parliament as Data Model

This text on SQL is based on the EP-European Parliament as Data Model.

The European Parliament (EP) is the parliamentary institution of the European Union (EU).

The European Parliament is elected by the citizens of the European Union to represent their interests.

Its origins go back to the 1950s and the founding Treaties. Since 1979 its members have been directly elected by the citizens of the EU.

Elections are held every five years, and every EU citizen is entitled to vote, and to stand as a candidate, wherever they live in the EU. Parliament thus expresses the democratic will of the European Union's nearly 500 million citizens and it represents their interests in discussions with the other EU institutions.

Please Note:

The sources for the data, which are all freely obtainable from the European Union website, and from several leaflets of the EU institutions, are listed in the References section.

These sources are used to create the fictitious Business Narrative of the "Data Modelling and Normalization" Chapter.

We called this Business Narrative; "the Story of the European Union" which explains about the EU and EP, for purposes of Data Modeling.

The Story of the European Union is the last Chapter of Kaye is Learning SQL.

Based On — Oracle Database as RDBMS

This text on SQL is based on the Oracle Database as RDBMS — Relational Database Management System.

Although the SQL Statements used in this text, apply specifically to the Oracle Database implementation of SQL, they are generally compatible with many other RDBMS SQL implementations.

The knowledge in the following sections are based on my personal work experience, as well as on several Oracle Documentation, which are freely downloadable from Oracle websites.

You can find a list of these documents in the References.

The name of the documents may change in time.

You can refer to the Oracle Documentation if you need more detailed information on any of the topics in this text.

Any errors present, are mine.

Breath of Fresh Air

About Aggregate Functions

Aggregate Functions return a single result row based on groups of rows, rather than on single rows.

Aggregate Functions are used in:

Dictionary Definition of Aggregate

Oxford Languages

A whole formed by combining several separate elements.

Merriam-Webster Dictionary

To collect or gather into a mass or whole.

Formed by the collection of units or particles into a body, mass, or amount.

Cambridge Dictionary

To combine into a single group or total.

Definition of Aggregate in SQL

Aggregate Functions return a single result row BASED ON GROUPS OF ROWS rather than on single rows.

Aggregate Functions can appear in SELECT lists, and in ORDER BY and HAVING Clauses.

They are commonly used with the GROUP BY Clause in a SELECT statement, where the database divides the rows of a queried table or view into GROUPS.

In a query containing a GROUP BY Clause, the elements of the SELECT list CAN BE;

The database applies the Aggregate Functions to EACH GROUP of ROWS and returns a SINGLE RESULT row for EACH GROUP.

If you OMIT the GROUP BY Clause, then database applies Aggregate Functions in the select list to ALL THE ROWS in the queried table or view.

You use Aggregate Functions in the HAVING Clause to ELIMINATE GROUPS from the output BASED ON the results of the Aggregate Functions, rather than on the values of the individual rows of the queried table or view.

USING the GROUP BY Clause

GROUP BY - Example 1

To return the SUM of Salaries FOR EACH POLITICAL GROUP in the European Parliament, issue the following SELECT statement:

SELECT PG_ID "Political Group",
SUM(SALARY) "Total Salary"
FROM	MEPS
GROUP BY PG_ID
ORDER BY PG_ID;            
        

group_ex1_01.jpg

Last row belongs to the Non-Attached Members of the European Parliament.

As we know from our Data Model, the Non-Attached Members of the Parliament have a NULL PG_ID.

GROUP BY - Example 2

To return the MAXIMUM and MINIMUM Salaries FOR EACH POLITICAL GROUP in the European Parliament, issue the following SELECT statement:

SELECT PG_ID "Political Group",	
MIN(SALARY) "Minimum Salary",	
MAX(SALARY) "Maximum Salary"
FROM	MEPS
GROUP BY PG_ID
ORDER BY PG_ID;           
        

group_ex2_01.jpg

GROUP BY - Example 3

An Aggregate Function / Group Function returns ONE Result on a Set of Rows.

This Set can be a WHOLE TABLE, in which case the Function works on ALL the Rows of the Table and returns ONE Result.

For example, the following SELECT Statement returns the MINIMUM, AVERAGE and MAXIMUM salary of all the Member of the Parliaments in table MEPS.

SELECT MIN(SALARY) "Minimum Salary",
AVG(SALARY) "Average Salary",
MAX(SALARY) "Maximum Salary"
FROM MEPS;        
        

group_ex3_01.jpg

Because the GROUP BY Clause is OMITTED, Aggregate Functions in the SELECT list ( MIN, AVG, MAX ) are applied to ALL THE ROWS of Table MEPS.

GROUP BY - Example 4

An Aggregate Function / Group Function returns ONE Result on a Set of Rows.

This Set can be a SUBSET of a TABLE.

In such a case, the Function works on the Rows of the SUBSET of the Table and returns ONE Result.

For example, the following SELECT Statement returns the MINIMUM, AVERAGE and MAXIMUM salary of the Member of the Parliaments who belong to the Political Group with PG_ID=80 within table MEPS.

SELECT MIN(SALARY) "Minimum Salary",
AVG(SALARY) "Average Salary",
MAX(SALARY) "Maximum Salary"
FROM MEPS
WHERE PG_ID=80;          
        

group_ex4_01.jpg

The clause WHERE PG_ID=80 specifies a SUBSET of table MEPS.

Because the GROUP BY Clause is OMITTED in the SELECT Statement, Aggregate Functions / Group Functions in the SELECT list ( MIN, AVG, MAX ) are applied to ALL THE ROWS of this SUBSET (Members with PG_ID=80) of MEPS.

GROUP BY - Example 5

An Aggregate Function / Group Function returns ONE Result on a Set of Rows.

There can be MORE THAN ONE SUBSET that the Aggregate Function / Group Function must operate on.

In such a case, the Function works on the Rows of EACH SUBSET of the Table and returns ONE Result per SUBSET.

For example, the following SELECT Statement returns the MINIMUM, AVERAGE and MAXIMUM salary of the Member of the Parliaments who belong to a Political Group (WHERE PG_ID IS NOT NULL), i.e. who are NOT Non-Attached Members of the European Parliament.

As we know from our Data Model, the Non-Attached Members of the Parliament have a NULL PG_ID.

SELECT	PG_ID,
MIN(SALARY) "Minimum Salary",
AVG(SALARY) "Average Salary",
MAX(SALARY) "Maximum Salary"
FROM	MEPS
WHERE	PG_ID IS NOT NULL
GROUP BY PG_ID
ORDER BY PG_ID;           
        

The SELECT Statement above initially ELIMINATES rows with a PG_ID of NULL.

This is achieved by the WHERE clause: WHERE PG_ID IS NOT NULL

Members of the European Parliament belong to Seven DISTINCT Political Groups, identified by their PG_ID.

The SELECT Statement then divides the rows into SUBSETS BY EACH PG_ID.

This is achieved by the GROUP BY clause: GROUP BY PG_ID

SUBSET 1: MEPs who belong to Political Group 1

SUBSET 2: MEPs who belong to Political Group 2

SUBSET 3: MEPs who belong to Political Group 3

:

:

SUBSET 7: MEPs who belong to Political Group 7

The SELECT Statement then calculates the MINIMUM, AVERAGE, and MAXIMUM Salaries FOR EACH SUBSET.

This is achieved by the Aggregate Functions / Group Functions in the SELECT columns list:

Below is the Result Set of this SELECT statement.

group_ex5_01.jpg

GROUP BY - Example 6

We can also GROUP BY MORE THAN ONE COLUMN.

In the example below, Members of the Parliament are divided into SUBSETS within EACH POLITICAL GROUP, and then within EACH COUNTRY.

This is achieved by the GROUP BY clause: GROUP BY PG_ID, COUNTRY_ID

The SELECT Statement then calculates the MAXIMUM Salary of EACH Small SUBSET;

COUNTRY within a POLITICAL GROUP

SELECT PG_ID,
COUNTRY_ID,
MAX(SALARY) "Maximum Salary"
FROM	MEPS
WHERE PG_ID IS NOT NULL
GROUP BY PG_ID, COUNTRY_ID
ORDER BY PG_ID, COUNTRY_ID ;           
        

We can see the Result Set output in three consecutive images below. Not all rows are included.

group_ex6_01.jpg

group_ex6_02.jpg

group_ex6_03.jpg

COUNT() Group Function

COUNT(*)

COUNT(*) returns the number of rows.

COUNT(column_name)

COUNT(column_name) returns the number of rows with NON NULL values for the column column_name.

COUNT(DISTINCT column_name)

COUNT(DISTINCT column_name) returns the number of DISTINCT NON NULL values of the column column_name.

COUNT(*)

The SELECT Statement below returns the number of rows in table MEPS.

We can see from the output that there are 787 Members of the European Parliament.

SELECT COUNT(*)
FROM   MEPS;
    

count_ex_01.jpg

COUNT(column_name)

The SELECT Statement below returns the number of rows in table MEPS, that have NON-NULL Values in column PG_ID.

We can see from the output that there are 757 Members of the European Parliament, who belong to a Political Group.

SELECT COUNT(PG_ID)
FROM   MEPS;   
        

count_ex_02.jpg

The same result can be obtained with the following SELECT Statement.

SELECT	COUNT(*)
FROM	MEPS
WHERE PG_ID IS NOT NULL; 
        

count_ex_04.jpg

COUNT(DISTINCT column_name)

The SELECT Statement below, returns the number of DISTINCT NON NULL values of column PG_ID in table MEPS.

We already know that there are Seven Political Groups in the European Parliament with Members, and the output confirms that.

SELECT COUNT(DISTINCT PG_ID)
FROM   MEPS;    
        

count_ex_03.jpg

COUNT() Example

Question: How many Members of the Parliament are there per Political Group, in the European Parliament?

The Result Set of the following SELECT Statement answers this question.

SELECT	PG_ID, COUNT(MEP_ID)
FROM	MEPS
GROUP BY PG_ID
ORDER BY PG_ID
        

count_ex_05.jpg

Another COUNT() Example

Question: How many distinct countries do the members of each political group belong to?

The Result Set of the following SELECT Statement answers this question.

SELECT PG_ID, COUNT(DISTINCT COUNTRY_ID)
FROM	MEPS
GROUP BY PG_ID
ORDER BY PG_ID
        

count_ex_06.jpg

HAVING Clause

HAVING - About

HAVING Clause works AFTER;

HAVING clause ELIMINATES SUBSETS / GROUPS from the Output / the Result Set;

Another way of saying this is:

HAVING eliminates Groups of Rows AFTER the Aggregate Function works on the rows.

WHERE eliminates Rows BEFORE the Aggregate Function works on them.

roses-05.jpg

HAVING - ORDER of PROCESS

Here is the ORDER of PROCESS:

  1. The WHERE Clause, if exists, ELIMINATES the ROWS.
  2. The GROUP BY Clause, DIVIDES the remaining rows, INTO SUBSETS.
  3. The AGGREGATE / GROUP functions are APPLIED on the SUBSETS.
  4. The HAVING Clause ELIMINATES the SUBSETS BASED ON the Results of the AGGREGATE / GROUP functions.

HAVING - Example

Report Requirements

Query with No Restrictions/Eliminations

The SQL Query / SELECT Statement below, retrieves the MINIMUM, AVERAGE, and MAXIMUM salaries of the Members of the European Parliament in EACH Political Group WITHOUT ANY RESTRICTIONS / ELIMINATIONS.

SELECT	PG_ID,
MIN(SALARY) "Minimum Salary",
AVG(SALARY) "Average Salary",
MAX(SALARY) "Maximum Salary"
FROM	MEPS
GROUP BY PG_ID
ORDER BY PG_ID;    
        

The Result set seen below, lists the MINIMUM, AVERAGE, and MAXIMUM salaries.

It lists the Non-Attached Members as well. Non-Attached Members have a PG_ID of NULL.

The Result set also lists ALL the Political Groups , even if their MINIMUM Salary IS NOT LESS THAN 15000.

having_ex_01_01.jpg

We can use the WHERE Clause to ELIMINATE the rows of the Non-Attached Members. That is, Members with a PG_ID value of NULL.

We can use the HAVING Clause to ELIMINATE the SUBSETS divided by the GROUP BY Clause , and the Aggregate/Group Functions APPLIED TO. The result of the Aggregate Function MIN(SALARY) should be GREATER THAN OR EQUAL TO 15000, to be ELIMINATED.

having_ex_01_04.jpg

WHERE Clause Elimination

The query below now includes the WHERE Clause which ELIMINATES the rows with a NULL PG_ID.

SELECT	PG_ID,
MIN(SALARY) "Minimum Salary",
AVG(SALARY) "Average Salary",
MAX(SALARY) "Maximum Salary"
FROM	MEPS
WHERE PG_ID IS NOT NULL
GROUP BY PG_ID
ORDER BY PG_ID;    
        

Now, the Result Set of the Query, DOES NOT INCLUDE the Non-Attached Members (PG_ID value NULL) of the European Parliament.

having_ex_01_03.jpg

HAVING Clause Elimination

The query below now includes the HAVING Clause which ELIMINATES the GROUPS with MINIMUM Salary NOT LESS THAN 15000.

SELECT	PG_ID,
MIN(SALARY) "Minimum Salary",
AVG(SALARY) "Average Salary",
MAX(SALARY) "Maximum Salary"
FROM	MEPS
WHERE PG_ID IS NOT NULL
GROUP BY PG_ID
HAVING MIN(SALARY) < 15000
ORDER BY PG_ID;
                
        

Now, the Result Set of the Query ONLY INCLUDES those Political Groups (SUBSETS of the MEPS table GROUPED BY their PG_ID) if the MINIMUM Salary of that Political Group is LESS THAN 15000.

having_ex_01_05.jpg

This resumes the example.

FIRST and LAST

FIRST and LAST — Report Requirements

Create a report which will list;

EARLIEST Oldest / FIRST JOIN_DATE

LATEST Youngest / LAST JOIN_DATE

Our Report will look like the following.

The numbers are symbolic.

MAX GDP_PPS in Earliest Joiners MIN GDP_PPS in Latest Joiners
987654321 123

The Result Set of the SELECT Statement we will write, will only have 1 ROW, and 2 COLUMNS, as seen above.

The 1st Column will hold the "GDP_PPS value of the Country" which has the HIGHEST(MAXIMUM) GDP_PPS among the Earliest Joiners.

The 2nd Column will hold the "GDP_PPS value of the Country" which has the LOWEST(MINIMUM) GDP_PPS among the Latest Joiners.

The data we will work on is in table COUNTRIES.

A Feel of Our Data

As the data we will work on is in table COUNTRIES, let's have a feel of our data.

We write the following SELECT Statement.

SELECT	COUNTRY_ID
,	COUNTRY_NAME
,	JOIN_DATE
,	GDP_PPS
FROM	COUNTRIES
WHERE	STATUS='M'
ORDER BY JOIN_DATE ASC
, 	GDP_PPS DESC;
        

The Result Set of the SELECT Statement can be seen below.

fl_02.jpg

The Result Set of the SELECT Statement with the SUBSET of Earliest Joiners, and the SUBSET of Latest Joiners.

fl_03.jpg

FIRST and LAST — SELECT Statement

Below is the SELECT Statement which retrieves the data for our Report.

The SELECT Statement uses the Clauses FIRST and LAST.

SELECT MAX(GDP_PPS) KEEP (DENSE_RANK FIRST ORDER BY JOIN_DATE) 
"MAX GDP PPS Over FIRST Join Date",
MIN(GDP_PPS) KEEP (DENSE_RANK LAST ORDER BY JOIN_DATE) 
"MIN GDP PPS Over LAST Join Date"
FROM	COUNTRIES
WHERE	STATUS='M';
        

The output of our report can be seen below.

fl_01.jpg

FIRST and LAST — About

FIRST and LAST are both Aggregate and Analytic functions.

They operate on a set of values from a set of rows that rank as the FIRST or LAST with respect to a given sorting specification.

The Aggregate function can be;

The KEEP keyword is for semantic clarity.

It indicates that only the FIRST or the LAST values of the Aggregate function will be returned.

DENSE_RANK FIRST or DENSE_RANK LAST indicates that the database will aggregate over only those rows with the minimum (FIRST) or the maximum (LAST) dense rank, which is also called Olympic Rank.

This resumes the Chapter on Aggregate Functions.

Breath of Fresh Air

References

just me