A talk about PgFincore is scheduled at PgCon10.
PgFincore is a set of functions to manage blocks in memory.
Each Table or Index is truncated in segment of (usually) 1GB, and each segment is truncated in block in filesystem.
Those functions let you know which and how many disk block from a relation are in the buffer cache of the operating system, and eventually write the result to a file. Then using this file, it is possible to restore the buffer cache state for each block of the relation.
Other functions are used to set a POSIX_FADVISE flag on the entire relation (each segment). The more usefull are probably WILLNEED and DONTNEED which push and pop blocks of each segments of a relation from buffer cache, respectively.
Each functions are call with at least a table name or an index name (or oid) as a parameter and walk each segment of the relation.
Lastest release is 0.4 : direct download
You can grab the latest code with git :
git clone git://git.postgresql.org/git/pgfincore.git
Or browse it
And the project is on pgfoundry : PgFincore
First, make stage :
export USE_PGXS=1 ; make clean ; make su ; export USE_PGXS=1 ; make install
Then, create the functions from the sql script (it should be in your contrib directory) :
psql mydb -f pgfincore.sql
Here are some examples of usage. If you want more details go to Documentation
You may want to try to keep a table or an index into the OS Page Buffer, or preload a table before your well know big query is executed (reducing the query time).
To do so, just execute the following query frequently enough (perhaps using crontab):
SELECT * FROM pgfadv_willneed('pgbench_accounts'); ...
You may want to restore a table or an index into the OS Page Buffer as it was while you do the snapshot. For example if you have to reboot your server, then when PostgreSQL start up the first queries might be slow because nether PostgreSQL or the OS have cache pages about the relations involved in those first queries. Executing a snapshot and a restore is very simple :
-- Snapshot SELECT * FROM pgmincore_snapshot('pgbench_accounts'); ... -- Restore SELECT * FROM pgfadv_willneed_snapshot('pgbench_accounts'); ...
Several functions are actually provided :
This function output size of OS blocks, number of free page in the OS Page Buffer.
cedric=# select * from pgsysconf();
block_size | block_free
------------+------------
4096 | 417534
This function provide information about the file system cache (buffer cache).
For the specified relation (can be call with tableoid too) it return :
cedric=# select * from pgmincore('pgbench_accounts');
relpath | block_size | block_disk | block_mem | group_mem
--------------------+------------+------------+-----------+-----------
base/16384/16603 | 4096 | 262144 | 0 | 0
base/16384/16603.1 | 4096 | 65726 | 0 | 0
cedric=# select * from pgbench_accounts limit 10000;
cedric=# select * from pgmincore('pgbench_accounts');
relpath | block_size | block_disk | block_mem | group_mem
--------------------+------------+------------+-----------+-----------
base/16384/16603 | 4096 | 262144 | 414 | 1
base/16384/16603.1 | 4096 | 65726 | 0 | 0
This function write a file with _mincore suffix for each segment of the relation. So it does a snapshot of on memory blocks per segments, allowing a reload with pgfadv_willneed_snapshot.
cedric=# select * from pgmincore_snapshot('pgbench_accounts');
relpath | block_size | block_disk | block_mem | group_mem
----------------------------+------------+------------+-----------+-----------
base/16385/49240_mincore | 4096 | 262144 | 0 | 0
base/16385/49240.1_mincore | 4096 | 262144 | 238180 | 2
base/16385/49240.2_mincore | 4096 | 262144 | 56478 | 2
base/16385/49240.3_mincore | 4096 | 46902 | 0 | 0
# ls -1 /var/lib/postgresql/8.4/main/base/16385/49240*
49240
49240.1
49240.1_mincore
49240.2
49240.2_mincore
49240.3
49240.3_mincore
49240_fsm
49240_mincore
49240_vm
All those function call fadvise with the prefix as the flag and return the same columns :
For the specified relation (can be call with tableoid too) it return :
This function set WILLNEED flag on the current relation. It means that OS will try to load as much blocks as possible of the relation. Main idea is to preload files on server startup, perhaps using cache hit/miss ratio or most required relations/indexes.
cedric=# select * from pgfadv_willneed('pgbench_accounts');
relpath | block_size | block_disk | block_free
--------------------+------------+------------+------------
base/16384/16603 | 4096 | 262144 | 3744
base/16384/16603.1 | 4096 | 65726 | 4236
This function set WILLNEED flag on each block which were in memory when pgmincore_snapshot was called .
This function set DONTNEED flag on the current relation. It means that OS will first unload blocks of the file if it need to free so memory. Main idea is to unload files when they are not usefull anymore (instead of perhaps more interesting blocks)
This function set NORMAL flag on the current relation.
This function set RANDOM flag on the current relation.
This function set SEQUENTIAL flag on the current relation.
You can debug the function with the following error level : DEBUG1 and DEBUG5
set client_min_messages TO debug5;