Contents
The contents of this work are based on;
This text on SQL is based on the EP-European Parliament as Data Model.
The European Parliament (EP) is the parliamentary institution of the European Union (EU).
The European Parliament is elected by the citizens of the European Union to represent their interests.
Its origins go back to the 1950s and the founding Treaties. Since 1979 its members have been directly elected by the citizens of the EU.
Elections are held every five years, and every EU citizen is entitled to vote, and to stand as a candidate, wherever they live in the EU. Parliament thus expresses the democratic will of the European Union's nearly 500 million citizens and it represents their interests in discussions with the other EU institutions.
Please Note:
The sources for the data, which are all freely obtainable from the European Union website, and from several leaflets of the EU institutions, are listed in the References section.
These sources are used to create the fictitious Business Narrative of the "Data Modelling and Normalization" Chapter.
We called this Business Narrative; "the Story of the European Union" which explains about the EU and EP, for purposes of Data Modeling.
The Story of the European Union is the last Chapter of Kaye is Learning SQL.
This text on SQL is based on the Oracle Database as RDBMS — Relational Database Management System.
Although the SQL Statements used in this text, apply specifically to the Oracle Database implementation of SQL, they are generally compatible with many other RDBMS SQL implementations.
The knowledge in the following sections are based on my personal work experience, as well as on several Oracle Documentation, which are freely downloadable from Oracle websites.
You can find a list of these documents in the References.
The name of the documents may change in time.
You can refer to the Oracle Documentation if you need more detailed information on any of the topics in this text.
Any errors present, are mine.
Aggregate Functions return a single result row based on groups of rows, rather than on single rows.
Aggregate Functions are used in:
Oxford Languages
A whole formed by combining several separate elements.
Merriam-Webster Dictionary
To collect or gather into a mass or whole.
Formed by the collection of units or particles into a body, mass, or amount.
Cambridge Dictionary
To combine into a single group or total.
Aggregate Functions return a single result row BASED ON GROUPS OF ROWS rather than on single rows.
Aggregate Functions can appear in SELECT lists, and in ORDER BY and HAVING Clauses.
They are commonly used with the GROUP BY Clause in a SELECT statement, where the database divides the rows of a queried table or view into GROUPS.
In a query containing a GROUP BY Clause, the elements of the SELECT list CAN BE;
The database applies the Aggregate Functions to EACH GROUP of ROWS and returns a SINGLE RESULT row for EACH GROUP.
If you OMIT the GROUP BY Clause, then database applies Aggregate Functions in the select list to ALL THE ROWS in the queried table or view.
You use Aggregate Functions in the HAVING Clause to ELIMINATE GROUPS from the output BASED ON the results of the Aggregate Functions, rather than on the values of the individual rows of the queried table or view.
To return the SUM of Salaries FOR EACH POLITICAL GROUP in the European Parliament, issue the following SELECT statement:
SELECT PG_ID "Political Group", SUM(SALARY) "Total Salary" FROM MEPS GROUP BY PG_ID ORDER BY PG_ID;
Last row belongs to the Non-Attached Members of the European Parliament.
As we know from our Data Model, the Non-Attached Members of the Parliament have a NULL PG_ID.
To return the MAXIMUM and MINIMUM Salaries FOR EACH POLITICAL GROUP in the European Parliament, issue the following SELECT statement:
SELECT PG_ID "Political Group", MIN(SALARY) "Minimum Salary", MAX(SALARY) "Maximum Salary" FROM MEPS GROUP BY PG_ID ORDER BY PG_ID;
An Aggregate Function / Group Function returns ONE Result on a Set of Rows.
This Set can be a WHOLE TABLE, in which case the Function works on ALL the Rows of the Table and returns ONE Result.
For example, the following SELECT Statement returns the MINIMUM, AVERAGE and MAXIMUM salary of all the Member of the Parliaments in table MEPS.
SELECT MIN(SALARY) "Minimum Salary", AVG(SALARY) "Average Salary", MAX(SALARY) "Maximum Salary" FROM MEPS;
Because the GROUP BY Clause is OMITTED, Aggregate Functions in the SELECT list ( MIN, AVG, MAX ) are applied to ALL THE ROWS of Table MEPS.
An Aggregate Function / Group Function returns ONE Result on a Set of Rows.
This Set can be a SUBSET of a TABLE.
In such a case, the Function works on the Rows of the SUBSET of the Table and returns ONE Result.
For example, the following SELECT Statement returns the MINIMUM, AVERAGE and MAXIMUM salary of the Member of the Parliaments who belong to the Political Group with PG_ID=80 within table MEPS.
SELECT MIN(SALARY) "Minimum Salary", AVG(SALARY) "Average Salary", MAX(SALARY) "Maximum Salary" FROM MEPS WHERE PG_ID=80;
The clause WHERE PG_ID=80 specifies a SUBSET of table MEPS.
Because the GROUP BY Clause is OMITTED in the SELECT Statement, Aggregate Functions / Group Functions in the SELECT list ( MIN, AVG, MAX ) are applied to ALL THE ROWS of this SUBSET (Members with PG_ID=80) of MEPS.
An Aggregate Function / Group Function returns ONE Result on a Set of Rows.
There can be MORE THAN ONE SUBSET that the Aggregate Function / Group Function must operate on.
In such a case, the Function works on the Rows of EACH SUBSET of the Table and returns ONE Result per SUBSET.
For example, the following SELECT Statement returns the MINIMUM, AVERAGE and MAXIMUM salary of the Member of the Parliaments who belong to a Political Group (WHERE PG_ID IS NOT NULL), i.e. who are NOT Non-Attached Members of the European Parliament.
As we know from our Data Model, the Non-Attached Members of the Parliament have a NULL PG_ID.
SELECT PG_ID, MIN(SALARY) "Minimum Salary", AVG(SALARY) "Average Salary", MAX(SALARY) "Maximum Salary" FROM MEPS WHERE PG_ID IS NOT NULL GROUP BY PG_ID ORDER BY PG_ID;
The SELECT Statement above initially ELIMINATES rows with a PG_ID of NULL.
This is achieved by the WHERE clause: WHERE PG_ID IS NOT NULL
Members of the European Parliament belong to Seven DISTINCT Political Groups, identified by their PG_ID.
The SELECT Statement then divides the rows into SUBSETS BY EACH PG_ID.
This is achieved by the GROUP BY clause: GROUP BY PG_ID
SUBSET 1: MEPs who belong to Political Group 1
SUBSET 2: MEPs who belong to Political Group 2
SUBSET 3: MEPs who belong to Political Group 3
:
:
SUBSET 7: MEPs who belong to Political Group 7
The SELECT Statement then calculates the MINIMUM, AVERAGE, and MAXIMUM Salaries FOR EACH SUBSET.
This is achieved by the Aggregate Functions / Group Functions in the SELECT columns list:
Below is the Result Set of this SELECT statement.
We can also GROUP BY MORE THAN ONE COLUMN.
In the example below, Members of the Parliament are divided into SUBSETS within EACH POLITICAL GROUP, and then within EACH COUNTRY.
The SELECT Statement then calculates the MAXIMUM Salary of EACH Small SUBSET;
COUNTRY within a POLITICAL GROUP
SELECT PG_ID, COUNTRY_ID, MAX(SALARY) "Maximum Salary" FROM MEPS WHERE PG_ID IS NOT NULL GROUP BY PG_ID, COUNTRY_ID ORDER BY PG_ID, COUNTRY_ID ;
We can see the Result Set output in three consecutive images below. Not all rows are included.
COUNT(*) returns the number of rows.
COUNT(column_name) returns the number of rows with NON NULL values for the column column_name.
COUNT(DISTINCT column_name) returns the number of DISTINCT NON NULL values of the column column_name.
The SELECT Statement below returns the number of rows in table MEPS.
We can see from the output that there are 787 Members of the European Parliament.
SELECT COUNT(*) FROM MEPS;
The SELECT Statement below returns the number of rows in table MEPS, that have NON-NULL Values in column PG_ID.
We can see from the output that there are 757 Members of the European Parliament, who belong to a Political Group.
SELECT COUNT(PG_ID) FROM MEPS;
The same result can be obtained with the following SELECT Statement.
SELECT COUNT(*) FROM MEPS WHERE PG_ID IS NOT NULL;
The SELECT Statement below, returns the number of DISTINCT NON NULL values of column PG_ID in table MEPS.
We already know that there are Seven Political Groups in the European Parliament with Members, and the output confirms that.
SELECT COUNT(DISTINCT PG_ID) FROM MEPS;
Question: How many Members of the Parliament are there per Political Group, in the European Parliament?
The Result Set of the following SELECT Statement answers this question.
SELECT PG_ID, COUNT(MEP_ID) FROM MEPS GROUP BY PG_ID ORDER BY PG_ID
Question: How many distinct countries do the members of each political group belong to?
The Result Set of the following SELECT Statement answers this question.
SELECT PG_ID, COUNT(DISTINCT COUNTRY_ID) FROM MEPS GROUP BY PG_ID ORDER BY PG_ID
HAVING Clause works AFTER;
HAVING clause ELIMINATES SUBSETS / GROUPS from the Output / the Result Set;
Another way of saying this is:
HAVING eliminates Groups of Rows AFTER the Aggregate Function works on the rows.
WHERE eliminates Rows BEFORE the Aggregate Function works on them.
Here is the ORDER of PROCESS:
The SQL Query / SELECT Statement below, retrieves the MINIMUM, AVERAGE, and MAXIMUM salaries of the Members of the European Parliament in EACH Political Group WITHOUT ANY RESTRICTIONS / ELIMINATIONS.
SELECT PG_ID, MIN(SALARY) "Minimum Salary", AVG(SALARY) "Average Salary", MAX(SALARY) "Maximum Salary" FROM MEPS GROUP BY PG_ID ORDER BY PG_ID;
The Result set seen below, lists the MINIMUM, AVERAGE, and MAXIMUM salaries.
It lists the Non-Attached Members as well. Non-Attached Members have a PG_ID of NULL.
The Result set also lists ALL the Political Groups , even if their MINIMUM Salary IS NOT LESS THAN 15000.
We can use the WHERE Clause to ELIMINATE the rows of the Non-Attached Members. That is, Members with a PG_ID value of NULL.
We can use the HAVING Clause to ELIMINATE the SUBSETS divided by the GROUP BY Clause , and the Aggregate/Group Functions APPLIED TO. The result of the Aggregate Function MIN(SALARY) should be GREATER THAN OR EQUAL TO 15000, to be ELIMINATED.
The query below now includes the WHERE Clause which ELIMINATES the rows with a NULL PG_ID.
SELECT PG_ID, MIN(SALARY) "Minimum Salary", AVG(SALARY) "Average Salary", MAX(SALARY) "Maximum Salary" FROM MEPS WHERE PG_ID IS NOT NULL GROUP BY PG_ID ORDER BY PG_ID;
Now, the Result Set of the Query, DOES NOT INCLUDE the Non-Attached Members (PG_ID value NULL) of the European Parliament.
The query below now includes the HAVING Clause which ELIMINATES the GROUPS with MINIMUM Salary NOT LESS THAN 15000.
SELECT PG_ID, MIN(SALARY) "Minimum Salary", AVG(SALARY) "Average Salary", MAX(SALARY) "Maximum Salary" FROM MEPS WHERE PG_ID IS NOT NULL GROUP BY PG_ID HAVING MIN(SALARY) < 15000 ORDER BY PG_ID;
Now, the Result Set of the Query ONLY INCLUDES those Political Groups (SUBSETS of the MEPS table GROUPED BY their PG_ID) if the MINIMUM Salary of that Political Group is LESS THAN 15000.
This resumes the example.
Create a report which will list;
EARLIEST → Oldest / FIRST JOIN_DATE
LATEST → Youngest / LAST JOIN_DATE
Our Report will look like the following.
The numbers are symbolic.
MAX GDP_PPS in Earliest Joiners | MIN GDP_PPS in Latest Joiners |
---|---|
987654321 | 123 |
The Result Set of the SELECT Statement we will write, will only have 1 ROW, and 2 COLUMNS, as seen above.
The 1st Column will hold the "GDP_PPS value of the Country" which has the HIGHEST(MAXIMUM) GDP_PPS among the Earliest Joiners.
The 2nd Column will hold the "GDP_PPS value of the Country" which has the LOWEST(MINIMUM) GDP_PPS among the Latest Joiners.
The data we will work on is in table COUNTRIES.
As the data we will work on is in table COUNTRIES, let's have a feel of our data.
We write the following SELECT Statement.
SELECT COUNTRY_ID , COUNTRY_NAME , JOIN_DATE , GDP_PPS FROM COUNTRIES WHERE STATUS='M' ORDER BY JOIN_DATE ASC , GDP_PPS DESC;
The Result Set of the SELECT Statement can be seen below.
The Result Set of the SELECT Statement with the SUBSET of Earliest Joiners, and the SUBSET of Latest Joiners.
Below is the SELECT Statement which retrieves the data for our Report.
The SELECT Statement uses the Clauses FIRST and LAST.
SELECT MAX(GDP_PPS) KEEP (DENSE_RANK FIRST ORDER BY JOIN_DATE) "MAX GDP PPS Over FIRST Join Date", MIN(GDP_PPS) KEEP (DENSE_RANK LAST ORDER BY JOIN_DATE) "MIN GDP PPS Over LAST Join Date" FROM COUNTRIES WHERE STATUS='M';
The output of our report can be seen below.
FIRST and LAST are both Aggregate and Analytic functions.
They operate on a set of values from a set of rows that rank as the FIRST or LAST with respect to a given sorting specification.
The Aggregate function can be;
The KEEP keyword is for semantic clarity.
It indicates that only the FIRST or the LAST values of the Aggregate function will be returned.
DENSE_RANK FIRST or DENSE_RANK LAST indicates that the database will aggregate over only those rows with the minimum (FIRST) or the maximum (LAST) dense rank, which is also called Olympic Rank.
This resumes the Chapter on Aggregate Functions.