PostgreSQL Query Flow

- January 10, 2020

Parser:

The parser stage consists of two parts:

The parser defined in gram.y and scan.l is built using the UNIX tools bison and flex.

The parser has to check the query string (which arrives as plain text) for valid syntax. If the syntax is correct a parse tree is built up and handed back; otherwise an error is returned.

The parser is defined in the file gram.y and consists of a set of grammar rules and actions that are executed whenever a rule is fired. The code of the actions (which is actually C code) is used to build up the parse tree.

The lexer is defined in the file scan.l and is responsible for recognizing identifiers, the SQL key words etc. For every key word or identifier that is found, a token is generated and handed to the parser.

The file scan.l is transformed to the C source file scan.c using the program flex and gram.y is transformed to gram.c using bison. After these transformations have taken place a normal C compiler can be used to create the parser. Never make any changes to the generated C files as they will be overwritten the next time flex or bison is called.

Traffic cop:

Parsing is Two Types:

Soft Parse – when the parsed representation of a submitted SQL statement exists in the Postgres Server (Shared Buffer) Performs syntax and semantic checks but avoids the relatively costly operation of query optimization. Reuses the existing Postgres SQL area which already has the execution plan required to execute the SQL statement

Hard Parse – if a statement cannot be reused or if it the very first time the SQL statement is being loaded in the Postgres Server (Shared Buffer), it results in a hard parse. Also when a statement is aged out of the Postgres Server (Shared Buffer) (because the Postgres Server (Shared Buffer) is limited in size), when it is reloaded again, it results in another hard parse. So size of the shared Buffer can also affect the amount of parse calls.

Query Rewriter:

PostgreSQL rule system consisted of two implementations:

The first one worked using row level processing and was implemented deep in the executor. The rule system was called whenever an individual row had been accessed. This implementation was removed in 1995 when the last official release of the Berkeley Postgres project was transformed into Postgres95.

The second implementation of the rule system is a technique called query rewriting. The rewrite system is a module that exists between the parser stage and the planner/optimizer. This technique is still implemented.

Optimizer:

The task of the planner/optimizer is to create an optimal execution plan. A given SQL query (and hence, a query tree) can be actually executed in a wide variety of different ways, each of which will produce the same set of results. If it is computationally feasible, the query optimizer will examine each of these possible execution plans, ultimately selecting the execution plan that is expected to run the fastest.

The planner/optimizer starts by generating plans for scanning each individual relation (table) used in the query. The possible plans are determined by the available indexes on each relation. There is always the possibility of performing a sequential scan on a relation, so a sequential scan plan is always created

Executor:

The executor takes the plan created by the planner/optimizer and recursively processes it to extract the required set of rows. This is essentially a demand-pull pipeline mechanism. Each time a plan node is called, it must deliver one more row, or report that it is done delivering rows.

The executor mechanism is used to evaluate all four basic SQL query types: SELECT, INSERT, UPDATE, and DELETE.

Search This Blog

Pearl DB Service