PostgreSQL Background process




1. bgwriter
2. check pointer
3. stats collector
4. sys logger
5. auto vacuum launcher
6. wal writer
7. archiver

Bgwriter:

When a PostgreSQL server process reads data from disk, it first moves the page that contains the data into the shared buffer pool. The shared buffer pool is so named because it's a region of memory that's shared by all server processes that access a given cluster. Another way to look at it is that the shared buffer pool is shared by all processes spawned by a single postmaster. When the shared buffer pool fills, PostgreSQL starts pushing old pages out of the pool to make room for new ones. PostgreSQL uses the LRU (least-recently-used) mechanism to select the page it evicts from the pool. If PostgreSQL chooses a page that has not been modified since it was placed in the pool, that page is simply discarded. On the other hand, if PostgreSQL chooses a page that has been modified, it must write the page to disk.
The BGWRITER improves overall performance in two ways. First, it tries to increase the number of free pages (or at least, the number of unmodified pages) in the shared buffer pool so that individual server processes won't have to wait for disk writes. Second, it decreases the number of modified pages found in the shared buffer pool when a CHECKPOINT occurs the BGWRITER smooth out the I/O spikes caused by CHECKPOINT.
You can tune the BGWRITER with three configuration parameters: BGWRITER_DELAY, BGWRITER_PERCENT, and BGWRITER_MAXPAGES. The BGWRITER_DELAY parameter controls how long the BGWRITER process sleeps between each round. BGWRITER_PERCENT and BGWRITER_MAXPAGES limit the number of pages that the BGWRITER_PROCESS flushes during each round

  • Bgwriter write for base:
  • Shared buffers full
  • Server start, stop and restart
  • Use create database
  • Use pg_start_backup query
  • Use pg_stop_backup query
  • Check pointer occur (30s) and timeout (5min)

Check pointer:

Checkpoints define the consistent state of the database. They are critical and it is important that checkpoints occur regularly enough to ensure data changes are permanently saved to disk and the database is at consistent state all the time.

Checkpoints will ensure:


  • All the committed data is written to the data files on the disk.
  • clog files are updated with commit status.
  • Transaction log files in pg_xlog (now pg_wal) directory are recycled.

That explains how I/O intensive checkpoints are. There are parameters in postgresql.conf which can be configured / tuned to control checkpoint behavior and those parameters are max_wal_size, min_wal_size, checkpoint_timeout and checkpoint_completion_target. These parameters will decide how frequently the checkpoints should occur, and within how much time the checkpoints have to finish.
Four reasons why a checkpoint can be triggered:
Executing the CHECKPOINT command directly
Executing a command that requires a checkpoint (e.g. pg_start_backup, pg_stop_backup, CREATE DATABASE, or pg_ctl stop|restart and a few others)

Stats collector:

PostgreSQL statistics collector is a subsystem that supports collection and reporting of information about server activity. Presently, the collector can count accesses to tables and indexes in both disk-block and individual-row terms. It also tracks the total number of rows in each table, and information about vacuum and analyze actions for each table. It can also count calls to user-defined functions and the total time spent in each one.

Sys logger:

PostgreSQL supports several methods for logging server messages, including stderr, csvlog and syslog. On Windows, eventlog is also supported. Set this parameter to a list of desired log destinations separated by commas. The default is to log to stderr only. This parameter can only be set in the postgresql.conf file or on the server command line.
This process writes error messages into log files. The utility processes, user backends, Postmaster Daemon are attached to syslogger process for logging the information about their activities.

Auto Vacuum launcher:

PostgreSQL has an optional but highly recommended feature called autovacuum, whose purpose is to automate the execution of VACUUM and ANALYZE commands. When enabled, autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility; therefore, autovacuum cannot be used unless track_counts is set to true. In the default configuration, autovacuuming is enabled and the related configuration parameters are appropriately set.
The “autovacuum daemon” actually consists of multiple processes. There is a persistent daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, attempting to start one worker within each database every autovacuum_naptime seconds. A maximum of autovacuum_max_workers worker processes are allowed to run at the same time.
If there are more than autovacuum_max_workers databases to be processed, the next database will be processed as soon as the first worker finishes. Each worker process will check each table within its database and execute VACUUM and/or ANALYZE as needed. log_autovacuum_min_duration can be set to monitor autovacuum workers' activity.
If several large tables all become eligible for vacuuming in a short amount of time, all autovacuum workers might become occupied with vacuuming those tables for a long period. This would result in other tables and databases not being vacuumed until a worker becomes available. There is no limit on how many workers might be in a single database, but workers do try to avoid repeating work that has already been done by other workers. Note that the number of running workers does not count towards max_connections or superuser_reserved_connections limits

Wal writer:

When we make changes to the data, the changes are first written to the buffers and records of these changes are written to the WAL buffer, the changes are flushed to the WAL segments when the changes are committed. Here writer process process is responsible for flushing the changes made to wal files.

Archiver process:

As you keep modifying the data in the databases on a server, WAL files keep getting generated (and discarded after a while).If you save a copy of each WAL file that was generated, you could replay back the entire set of changes into another server. Doing this in real-time is better known as replication. Saving all generated WAL files to a safe offline location essentially becomes incremental backup. In PostgreSQL terms, copying out generated WAL files is called archiving, and getting the server to read in a WAL file and apply it is called restoring.

Comments

Popular posts from this blog

PostgreSQL pg_pool-II Installation and Configuration

PostgreSQL Migration Using MTK

PostgreSQL Pages and Tuples