Introduction
As you may know the PostgreSQL database page contains a lot of informations that is documented here. A great study has been done by Frits Hoogland in this series of blogposts. I strongly recommend to read Frits series before to read this blog post (unless you are familiar with PostgreSQL block internals).
By reading the contents of a page we can extract:
- The percentage of free space within a page
- The percentage of current rows within a page
- The percentage of unused rows within a page
Welcome to pgdfv
pgdfv stands for: PostgreSQL data file visualizer. It helps to visualize the data file pages in a easy way.
For each block:
- A color is assigned (depending of the percentage of free space)
- The color can be displayed in 2 ways (depending if more than 50 percent of the rows are current)
- A number is assigned (based on the percentage of unused rows)
At the end the utility provides a summary for all the blocks visited. As a picture is worth a thousand words, let’s see some examples.
Examples
Let’s create a table, insert 4 rows in it and inspect its content:
pgdbv=# create table bdtable(id int not null, f varchar(30) );
CREATE TABLE
pgdbv=# insert into bdtable ( id, f ) values (1, 'aaaaaaaaaa');
INSERT 0 1
pgdbv=# insert into bdtable ( id, f ) values (2, 'aaaaaaaaaa');
INSERT 0 1
pgdbv=# insert into bdtable ( id, f ) values (3, 'aaaaaaaaaa');
INSERT 0 1
pgdbv=# insert into bdtable ( id, f ) values (4, 'aaaaaaaaaa');
INSERT 0 1
pgdbv=# checkpoint;
CHECKPOINT
pgdbv=# select * from heap_page_items(get_raw_page('bdtable',0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+----------------------------------
1 | 8152 | 1 | 39 | 1945 | 0 | 0 | (0,1) | 2 | 2050 | 24 | | | \x010000001761616161616161616161
2 | 8112 | 1 | 39 | 1946 | 0 | 0 | (0,2) | 2 | 2050 | 24 | | | \x020000001761616161616161616161
3 | 8072 | 1 | 39 | 1947 | 0 | 0 | (0,3) | 2 | 2050 | 24 | | | \x030000001761616161616161616161
4 | 8032 | 1 | 39 | 1948 | 0 | 0 | (0,4) | 2 | 2050 | 24 | | | \x040000001761616161616161616161
(4 rows)
In PostgreSQL, each table is stored in a separate file. When a table exceeds 1 GB, it is divided into gigabyte-sized segments.
Let’s check which file contains the table:
pgdbv=# SELECT pg_relation_filepath('bdtable');
pg_relation_filepath
----------------------
base/16416/16448
Let’s use the utility on this file:
So one block is detected, with more than 75% of free space (so the green color), more than 50% of the rows are current (100% in our case as tx_max = 0 for all the rows) and unused are less than 10% (0 is displayed) (0% in our case as no rows with lp_flags = 0).
Let’s delete, 3 rows:
pgdbv=# delete from bdtable where id <=3;
DELETE 3
pgdbv=# checkpoint;
CHECKPOINT
pgdbv=# select * from heap_page_items(get_raw_page('bdtable',0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+----------------------------------
1 | 8152 | 1 | 39 | 1945 | 1949 | 0 | (0,1) | 8194 | 258 | 24 | | | \x010000001761616161616161616161
2 | 8112 | 1 | 39 | 1946 | 1949 | 0 | (0,2) | 8194 | 258 | 24 | | | \x020000001761616161616161616161
3 | 8072 | 1 | 39 | 1947 | 1949 | 0 | (0,3) | 8194 | 258 | 24 | | | \x030000001761616161616161616161
4 | 8032 | 1 | 39 | 1948 | 0 | 0 | (0,4) | 2 | 2306 | 24 | | | \x040000001761616161616161616161
(4 rows)
and launch the tool again:
As you can see we still have more than 75% of free space in the block but the way to display the color has been changed (because now less than 50% of the rows are current aka tx_max = 0) and unused is still less than 10% (0 is displayed).
Let’s vacuum the table:
pgdbv=# vacuum bdtable;
VACUUM
pgdbv=# checkpoint;
CHECKPOINT
pgdbv=# select * from heap_page_items(get_raw_page('bdtable',0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+----------------------------------
1 | 0 | 0 | 0 | | | | | | | | | |
2 | 0 | 0 | 0 | | | | | | | | | |
3 | 0 | 0 | 0 | | | | | | | | | |
4 | 8152 | 1 | 39 | 1948 | 0 | 0 | (0,4) | 2 | 2306 | 24 | | | \x040000001761616161616161616161
(4 rows)
and launch the utility:
As you can see we still have more than 75% of free space, less than 50% of the rows are current (only one in our case) and now there is between 70 and 80% of unused rows in the block (so 7 is displayed).
The legend and summary are dynamic and depend of the contents of the scanned blocks.
For example, on a newly created table (made of 355 blocks) you could end up with something like:
Then, you delete half of the rows:
Then, once the table has been vacuum:
And once new rows have been inserted:
Remarks
- The tool is available in this repository.
- The tool is 100% inspired by Frits Hoogland blogpost series and by odbv (written by Kamil Stawiarski) that can be used to visualize oracle database blocks.
Conclusion
pgdfv is a new utility that can be used to visualize PostgreSQL data file pages.