PostgreSQL Query Flow
Parser:
The parser stage consists of two parts:
The parser defined in gram.y and
scan.l is built using the UNIX tools bison and flex.
The parser has to check the query
string (which arrives as plain text) for valid syntax. If the syntax is correct
a parse tree is built up and handed back; otherwise an error is returned.
The parser is defined in the file
gram.y and consists of a set of grammar rules and actions that are executed
whenever a rule is fired. The code of the actions (which is actually C code) is
used to build up the parse tree.
The lexer is defined in the file
scan.l and is responsible for recognizing identifiers, the SQL key words etc.
For every key word or identifier that is found, a token is generated and handed
to the parser.
The file scan.l is transformed to
the C source file scan.c using the program flex and gram.y is transformed to
gram.c using bison. After these transformations have taken place a normal C
compiler can be used to create the parser. Never make any changes to the
generated C files as they will be overwritten the next time flex or bison is
called.
Traffic cop:
Parsing is Two Types:
Soft Parse – when the parsed
representation of a submitted SQL statement exists in the Postgres Server (Shared
Buffer) Performs syntax and semantic checks but avoids the relatively costly
operation of query optimization. Reuses the existing Postgres SQL area which
already has the execution plan required to execute the SQL statement
Hard Parse – if a statement cannot
be reused or if it the very first time the SQL statement is being loaded in the
Postgres Server (Shared Buffer), it results in a hard parse. Also when a
statement is aged out of the Postgres Server (Shared Buffer) (because the Postgres
Server (Shared Buffer) is limited in size), when it is reloaded again, it
results in another hard parse. So size of the shared Buffer can also affect the
amount of parse calls.
Query Rewriter:
PostgreSQL rule system consisted of two implementations:
The first one worked using row level
processing and was implemented deep in the executor. The rule system was called
whenever an individual row had been accessed. This implementation was removed
in 1995 when the last official release of the Berkeley Postgres project was
transformed into Postgres95.
The second implementation of the
rule system is a technique called query rewriting. The rewrite system is a
module that exists between the parser stage and the planner/optimizer. This
technique is still implemented.
Optimizer:
The task of the planner/optimizer
is to create an optimal execution plan. A given SQL query (and hence, a query
tree) can be actually executed in a wide variety of different ways, each of
which will produce the same set of results. If it is computationally feasible,
the query optimizer will examine each of these possible execution plans,
ultimately selecting the execution plan that is expected to run the fastest.
The planner/optimizer starts by
generating plans for scanning each individual relation (table) used in the
query. The possible plans are determined by the available indexes on each
relation. There is always the possibility of performing a sequential scan on a
relation, so a sequential scan plan is always created
Executor:
The executor takes the plan created
by the planner/optimizer and recursively processes it to extract the required
set of rows. This is essentially a demand-pull pipeline mechanism. Each time a
plan node is called, it must deliver one more row, or report that it is done
delivering rows.
The executor mechanism is used to
evaluate all four basic SQL query types: SELECT, INSERT, UPDATE, and DELETE.
Comments
Post a Comment