108 views
Special Homework SPR22: Extending SQL Island with Common Table Expressions === *Special Homework of the module Data Management in MSE Spring 2022* *Authors (Group N3): Stephanie Ruch, Ivan Ziegler, Nick Meier, Roman Loop, Samuel Kolb* <!-- # Table of Contents [toc] --> # Common Table Expressions (CTE) ## What are CTEs? A Common Table Expression (CTE) is the result set of a query which exists temporarily and for use only within the context of a larger query. Much like a derived table, the result of a CTE is not stored and exists only for the duration of the query [1]. A CTE always returns a result set, which then can be referenced by another SQL statement (select, insert, delete or update). CTEs are SQL-compliant and part of the ANSI SQL 99 specification [2]. Lets make that a bit more understandable with an example: <!-- ![](https://www.essentialsql.com/wp-content/uploads/2016/06/Common-Table-Expression-Parts-e1465921570548.png?ezimgfmt=ng:webp/ngcb23) --> <div style="text-align:center"> <img src="https://www.essentialsql.com/wp-content/uploads/2016/06/Common-Table-Expression-Parts-e1465921570548.png?ezimgfmt=ng:webp/ngcb23" width=600 alt="Neo4J graph structure"/> <figcaption><i>Figure 1</i>: Example of a CTE query</figcaption> </div> The blue part ist the CTE itself. The yellow part is the "main" query which uses the CTE or more explicit the temporary stored result set of the CTE. In the next part we will explain all elements of the CTE. ## CTE Syntax A CTE contains of the following elements: * CTEs are initated with the keyword ```WITH``` * Followed by a (meaningful) name for the result query * *Optional: Column mapping in brackets* * Followed by the keyword ```AS``` and open and close brackets ```()``` * Within the brackets the CTE query will be defined * *Note: When chaining multiple CTEs they must be seperated with a **comma*** Here we provide a more generic form of a CTE structure. ```sql WITH cte_name_1 (column_name_1, column_name_2, column_name_X) -- Define 1st CTE name and column list. AS (CTE query definition 1) -- Define the first CTE query. , -- Use a comma to separate multiple CTE definitions. cte_name_X (column_name_1, column_name_2, column_name_X) -- Define 2nd CTE name and column list. AS (CTE query definition X) -- Define the second CTE query. -- Define the outer query referencing the CTE name SELECT column_A, column_B, ... FROM cte_name_1 ``` ## CTE types CTEs are categorized as non-recursive and recursive. ### Non-recursive CTE What we presented so far are non-recursive CTEs, their general syntax is: ```sql WITH cte_name AS (cte_query_definition) SELECT * FROM cte_name; ``` ### Recursive CTE A recursive CTE references itself. It returns the result subset, then it repeatedly (recursively) references itself, and stops when it returns all the results. The syntax for a recursive CTE is not too different from that of a non-recursive CTE: ```sql WITH RECURSIVE cte_name AS ( cte_query_definition (the anchor member) UNION ALL cte_query_definition (the recursive member) ) SELECT * FROM cte_name; ``` The recursive CTE starts with ```WITH``` and is then followed by ```RECURSIVE```, contrary to the non-recursive CTE. To connect the anchor member with the recursive member, you need to use the ```UNION``` or ```UNION ALL``` command. Recursive CTEs are mainly used to query hierarchical data or graphs. [3], [4] ## Why do we need CTE's? CTEs enable users to write and maintain complex queries more easily. They increase **readability** and **simplification**. The reduction in complexity is achieved by splitting complex queries into simple blocks, which could be reused if necessary. [4] ## Increase readability with CTE (Example) The following example works with data on the results of a long jump competition. Two tables are provided: **Participant data:** ```participant``` – Stores information about contest participants: ```participant_id``` – A unique ID for each participant. ```first_name``` – The participant’s first name. ```last_name``` – The participant’s last name. **Jump data:** ```jump``` – Stores information about the jumps made by the participants: ```jump_id``` – The ID of each jump. ```participant_id``` – The ID of the participant who made the jump. ```contest_id``` – The ID of the contest in which the jump was made. ```length``` – Jump length, in centimeters. <!--![](https://md.coredump.ch/uploads/upload_9b9b1697a15670e2ee63dcec1d2053a1.png) --> Now we want to write a query which is showing the largest average jump length by any participant. SQL-Query using Subqueries: ```sql SELECT MAX(avg_length) AS max_length FROM ( SELECT first_name, last_name, AVG(length) AS avg_length FROM participant JOIN jump ON jump.participant_id = participant.participant_id GROUP BY first_name, last_name ) AS average_lengths; ``` SQL-Query using CTE: ```sql WITH average_lengths AS ( SELECT first_name, last_name, AVG(length) AS avg_length FROM participant JOIN jump ON jump.participant_id = participant.participant_id GROUP BY first_name, last_name ) SELECT MAX(avg_length) AS max_length FROM average_lengths; ``` This returns exactly the same result as the previous example with the subquery: the largest average jump length by any participant. However, instead of writing our query inside the FROM clause, we’ve put it in the WITH clause (which comes before the main query). That makes it way more readable and maintainable. # SQL Island ## Introduction to SQL Island [SQL Island](http://sql-island.informatik.uni-kl.de/?lang=en) is an adventure game controlled by sending SQL queries which teach the fundamentals of the language in an entertaining and funny way. The player learns how to search and manipulate data in a relational database. Prior knowledge of the SQL language is not required. After the survived plane crash, you will be stuck on SQL Island for the time being.” [5] ## What happend until now? Our dear friend Sqlaus had a plane crash and is the only survivor. Stranded on SQL island, he meets the people which only speak sql. Information about villages, inhabitants and available items are stored in the following three tables. ```sql DORF (dorfnr, name, haeuptling) BEWOHNER (bewohnernr, name, dorfnr, geschlecht, beruf, gold, status) GEGENSTAND (gegenstand, besitzer) ``` Since Sqlaus has no money, he needs to find a job to survive on the island until he can go home again. To get out of the island he needs to find the pilot. But it turns out that the pilot was kidnapped. To free the pilot, he kills the man who is holding him captive. <div style="text-align:center"> <img src="https://md.coredump.ch/uploads/upload_42d23dc64e6dff36e4af781ec46a1f97.PNG" width=600 alt="Neo4J graph structure"/> <figcaption><i>Figure 3</i>: Overview of the characters and the most important items</figcaption> </div> ## Continuation of the story Sqlaus and the pilot are on their way to the airport. When they arrive, they find out that the tropical storm of the last few days has devastated the entire airport as well as the plane and some of its parts have been scattered across the island. ### Get a list of the missing airplane parts To get the plane flying again, they first have to find all the missing parts of the airplane. Fortunately, an airport employee has already checked with the island's chiefs to see where the parts are and if they are still functional, and then made a list with the missing parts, location and status. ```sql DROP TABLE IF EXISTS missing_aircraft_parts; CREATE TABLE missing_aircraft_parts ( "part" VARCHAR(60) NOT NULL UNIQUE, "location" VARCHAR(60), "village" VARCHAR(60) , "status" VARCHAR(60) ); INSERT INTO missing_aircraft_parts ("part", "location", "village", "status") VALUES ('Engine', 'South Beach', 'Gurkendorf', 'functional'), ('Propeller', 'SQL Forest', 'Gurkendorf', 'broken'), ('Wheels', 'Village', 'Zwiebelhausen', 'functional'), ('Seat', 'SQL Forest', 'Gurkendorf', 'functional'), ('Window', 'Village', 'Zwiebelhausen', 'broken'); SELECT * FROM missing_aircraft_parts ``` <div style="text-align:center"> <img src="https://md.coredump.ch/uploads/upload_e18e27a04679b3f45c4bb1defcb9b233.png" width=700 alt=""/> </div> <div style="text-align:center"> <img src="https://md.coredump.ch/uploads/upload_13bbb185f7b0b5bb15201d2a925fa08e.PNG" width=500 alt=""/> <figcaption><i>Figure 4</i>: Map of SQL-Island and the scattered parts of the airplane</figcaption> </div> ### Find friendly people to help with the search and retrieval of the parts To help finding the parts Sqlaus needs to ask all the friendly people from each village in which the parts were scattered. ```sql WITH cte1 (bewohnernr, name, dorfnr, beruf) -- Define the CTE name and column list AS -- Define the first CTE query. ( SELECT dorf.dorfnr, dorf.name, dorf.haeuptling, bewohner.beruf, bewohner.status AS status_haeupt, bewohner.gold FROM dorf LEFT JOIN bewohner ON bewohner.bewohnernr = dorf.haeuptling ) , -- Use a comma to separate multiple CTE definitions. cte2 AS -- Define the second CTE query ( SELECT bewohner.bewohnernr, bewohner.name, bewohner.status, cte1.status_haeupt, bewohner.dorfnr, cte1.name AS name_dorf, bewohner.beruf FROM bewohner LEFT JOIN cte1 ON bewohner.dorfnr = cte1.bewohnernr ) -- Define the outer query SELECT name, name_dorf FROM cte2 WHERE name_dorf IN ('Gurkendorf', 'Zwiebelhausen') AND status = 'friedlich' ``` <div style="text-align:center"> <img src="https://md.coredump.ch/uploads/upload_59bd65b2794934aca3218d1bdc7d3d2b.png" width=400 alt=""/> </div> The friendly people from Zwiebelhausen and Gurkendorf kindly help to return the broken parts to the airport. ### Find a mechanic In a next step they need a mechanic to repair the broken parts of the aircraft. But because the häuptling of the village of the mechanic should be friendly they first need to find out which mechanic has a friendly häuptling. Otherwise the häutpling could forbid the mechanic to repair the airplane for Sqlaus and the pilot. ```sql WITH cte (bewohnernr, name, dorfnr, beruf) -- Define the CTE name and column list AS -- Define the first CTE query. ( SELECT dorf.dorfnr, dorf.name, dorf.haeuptling, bewohner.beruf, bewohner.status AS status_haeupt, bewohner.gold FROM dorf LEFT JOIN bewohner ON bewohner.bewohnernr = dorf.haeuptling ), cte2 AS -- Define the second CTE query ( SELECT bewohner.bewohnernr, bewohner.name, bewohner.status, cte.status_haeupt, bewohner.dorfnr, cte.name, bewohner.beruf FROM bewohner LEFT JOIN cte ON bewohner.dorfnr = cte.bewohnernr ) -- Define the outer query SELECT * FROM cte2 WHERE status_haeupt = 'friedlich' AND beruf like '%schmied' AND status = 'friedlich'; ``` <div style="text-align:center"> <img src="https://md.coredump.ch/uploads/upload_9c05f2e1eef0b1fb7019d19ba6af15ee.png" width=800 alt=""/> </div> Luckily there are 2 mechanics in Zwiebelhausen which have a friendly häuptling. So Ernst Peng will help to repair the airplane. But he needs a rope to get the engine back on the airplane. ### Search for a rope Thus Sqlaus and the pilot start to search for a rope: ```sql WITH cte (bewohnernr, name , dorfnr, gegenstand, status) -- Define CTE name and column list AS -- Define the CTE query. ( SELECT bewohnernr, name, dorfnr, gegenstand.gegenstand, status FROM bewohner LEFT JOIN gegenstand ON gegenstand.besitzer = bewohner.bewohnernr ) -- Define the outer query SELECT * FROM cte WHERE gegenstand = 'Seil'; ``` <div style="text-align:center"> <img src="https://md.coredump.ch/uploads/upload_630a069ebfb533c6a9d2f03eca0206b9.png" width=800 alt=""/> </div> The rope belongs to Erich Rasenkopf but unfortunately he wants 100 gold for it. They don't have enough money yet. ### Come up with an ingenious plan to get money to buy the rope Maybe there is an evil häutpling which the inhabitants of this village don't like. ```sql WITH cte (dorfnr) AS ( SELECT bewohner.dorfnr, dorf.name, dorf.haeuptling, bewohner.beruf, bewohner.status AS status_haeupt, bewohner.gold, bewohner.bewohnernr FROM dorf LEFT JOIN bewohner ON bewohner.bewohnernr = dorf.haeuptling) SELECT dorfnr, name, bewohnernr, status_haeupt, gold, beruf FROM cte WHERE status_haeupt = 'boese'; ``` <div style="text-align:center"> <img src="https://md.coredump.ch/uploads/upload_30f7b1ba221f4000d6d797025b8f6cff.png" width=800 alt=""/> </div> So there is an evil häutpling which lives in Gurkendorf. Additionally, he has 4850 gold. Sqlaus will look for friendly inhabitants in Gurkendorf which live with the evil häuptling: ```sql WITH cte (bewohnernr, name, dorfnr, beruf) AS (SELECT dorf.dorfnr, dorf.name, dorf.haeuptling, bewohner.beruf, bewohner.status as status_haeupt, bewohner.gold from dorf LEFT JOIN bewohner on bewohner.bewohnernr = dorf.haeuptling), cte2 AS (Select bewohner.bewohnernr, bewohner.name, bewohner.status, cte.status_haeupt, bewohner.dorfnr, cte.name, bewohner.beruf, bewohner.gold from bewohner LEFT JOIN cte ON bewohner.dorfnr = cte.bewohnernr) SELECT * FROM cte2 WHERE dorfnr = 2 and status = 'friedlich'; ``` <div style="text-align:center"> <img src="https://md.coredump.ch/uploads/upload_61ac0acb08074959f32a3ddce5f2605a.png" width=800 alt=""/> </div> There are 3 inhabitants which are friendly. Sqlaus and the pilot ask them if they would like to get rid of their häuptling and split his money by 4. ```sql SELECT gold, round(gold/4, 2) AS gold_per_person FROM bewohner WHERE bewohnernr=6; ``` <div style="text-align:center"> <img src="https://md.coredump.ch/uploads/upload_7d12df71b701870cc3af05c667127075.png" width=300 alt=""/> </div> Each person would get 1212.5 gold. The three inhabintants aggree and togehter they get rid of the häuptling of Gurkendorf. It is a win win situation, the inhabitants of Gurkendorf can now elect a friendly häuptling and Sqlaus and the pilot have enough money to buy the rope to repair the airplane. ### Happy end Sqlaus can now fly home with the pilot and the repaired airplane. And they lived happily ever after! <div style="text-align:center"> <img src="https://md.coredump.ch/uploads/upload_021428392ab8c5ceab673636fde60d1b.PNG" width=600 alt=""/> <figcaption><i>Figure 5</i>: Sqlaus and the pilot flying home</figcaption> </div> # APPENDIX: Bibliography <!--Hauer, Philipp. (2015). Why Relational Databases are not the Cure-All. https://phauer.com/2015/relational-databases-strength-weaknesses-mongodb/--> - [1] Babic, Tihomir. (2021). What Is a Recursive CTE in SQL?. https://learnsql.com/blog/sql-recursive-cte/ - [2] Welch, AJ. (2021). Common Table Expressions: When and How to Use Them. https://chartio.com/resources/tutorials/using-common-table-expressions/ - [3] Wenzel, Kris. (2022). Common Table Expressions – The Ultimate Guide. https://www.essentialsql.com/introduction-common-table-expressions-ctes/ - [4] Shanu, Syed. (2017). SQL Server Common Table Expressions (CTE). https://www.sqlshack.com/sql-server-common-table-expressions-cte/ - [5] Bladoszewski, Kamil. (2020). Subquery vs. CTE: A SQL Primer. https://learnsql.com/blog/sql-subquery-cte-difference/ - [6] Schildgen, Johannes. (2015). SQL-Grundlagen spielend lernen mit dem Text-Adventure SQL Island. https://sql-island.informatik.uni-kl.de/ # APPENDIX: Quiz ## Questions: 1. When querying with a CTE an additional table is generated: - Where is it stored? - How long will it be available? 2. Given below is an example of a subquery. The same query can be converted into a CTE. - How would the query look like as a CTE? - What are the advantages of a CTE over a subquery? ```sql SELECT MAX(avg_grade) as best_grade FROM ( SELECT student, avg(grade) as avg_grade, FROM students GROUP BY student ) as student_grades; ``` 3. In a CTE multiple tables can be defined, make an example how the syntax for multiple tables looks. 4. Describe what code would be placed at the marked positions in a recursive CTE. ```sql WITH cte_table AS ( -- CODE-PART-1 -- UNION ALL -- CODE-PART-2 -- ) SELECT * FROM cte_table; ``` <!-- 5. Lets have a look on the hirarchy of a company: ![organigram showing the hirarchy of a company](https://md.coredump.ch/uploads/upload_45e198e41454ee4ffc032d0485154949.png) The same information can be represented by the following table: | id | name | boss_id | hirarchy_level | |:---:|:------:|:-------:|:--------------:| | 1 | Walter | | 0 | | 2 | Heidi | 1 | 1 | | 3 | Maria | 2 | 2 | | 4 | Hans | 1 | 1 | | 5 | Rudolf | 4 | 2 | | 6 | Ruth | 4 | 2 | Create a recursive CTE wich recreates the last column (hirarchy_level) based on the first 3 (id, name, boss_id). --> ## Answers: 1. When querying with a CTE an additional table is generated in the memory of the server, the temporary table is only available for the current query and will be deleted afterwards. 2. When rewriting the example of a subquery the result might look like this: ```sql WITH student_grades as ( SELECT student, avg(grade) as avg_grade, FROM students GROUP BY student ) SELECT MAX(avg_grade) as best_grade FROM student_grades; ``` Both of queries return the same result, the advantage of CTE's is that the query is easier to read. This is especially true for more complex queries. 3. The following example shows how multiple tables in CTE's can be defined: ```sql WITH o as ( SELECT customer_id, SUM(price) as customer_money_spent FROM orders GROUP BY customer_id ), r as ( SELECT customer_id, AVG(rating) as customer_avg_rating FROM ratings GROUP BY customer_id ) SELECT customers.customer_id, customers.customer_name, o.customer_order_count, o.customer_money_spent, r.customer_avg_rating FROM customers LEFT JOIN o ON o.customer_id = customers.customer_id LEFT JOIN r ON r.customer_id = customers.customer_id ``` 4. A recursive CTE mostly consists of the following components:\ **PART 1:** \ An initial query, also called the anchor member, it describes how the first values will be retrieved. They are handled separately, as this query most often doesn't contain recursive elements. \ **PART 2:** \ The recursive query. It uses the table of the CTE in order to recursively aquire the values. <!-- 5. The following query in a first step (initial query) searches for the employee with NULL as boss (which means it has no boss registered, an therefore might be the owner or president of the company). The Level for this employee is set to 0 (top). The second part of the table is built with the recursive query. For the next rows it searches for employees where the boss_id is present in the CTE-Table (hierarchy_cte). The, hirarchy_level, incremented by 1, of the boss will be assigned to the employee. ```sql WITH RECURSIVE hierarchy_cte AS ( -- initial query (not recursive) SELECT *, 0 AS hierarchy_level -- set first value FROM employees WHERE boss_id IS NULL UNION ALL -- recursive query SELECT e.*, h.hierarchy_level + 1 FROM employees e, hierarchy_cte h -- accessing itself WHERE e.boss_id = h.id ) SELECT * FROM hierarchy_cte; ``` -->