Postgres Performance Monitoring with PgHero
In a previous role, I needed to optimize a Postgres database to store 2-3 billion total rows using an r4.xlarge instance on AWS RDS. This was storing weather data parameters for the world where parts were being updated every 5-15 minutes as new data came in, and old data needed to be removed. The total data stored was only for 2-3 weeks, so the whole database would be recycled with new data in that time frame. To better handle and understand changes, I needed a way to see how each query was performing, and how much of an impact to database server usage each was having.
What is PgHero
PgHero is a web based performance dashboard built on top of pg_stat_statements. pg_stat_statements tracks statistics of the SQL statements run on that server which can provide extremely valuable information when you are hunting down slow or unexpected database performance.
It may not be as pretty as AWS RDS Performance Insights, but it surfaces enough information along with helpful suggestions for reviewing the performance impact of your queries.
Example Weather Data
Here I’ve setup a Postgres instance with PgHero where I’ve generated ~125M rows in a single weather_record
table with the following simple schema.
create table public.weather_record
(
id serial
primary key,
time timestamp not null,
energy_delta double precision not null,
ghi double precision not null,
temperature double precision not null,
pressure double precision not null,
humidity double precision not null,
wind_speed double precision not null,
rain1h double precision not null,
snow1h double precision not null,
clouds_all integer not null,
latitude double precision not null,
longitude double precision not null
);
The data takes a small dataset from Kaggle, and I just replicated the data with some randomness across the globe at a 0.5 degree interval.
My environment is setup with Docker Compose using the following PostGIS setup along with PgHero.
version: '3.8'
services:
postgres:
image: postgis/postgis:15-3.3
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: demo
POSTGRES_INITDB_ARGS: "--data-checksums"
POSTGRES_INITDB_WALDIR: "/var/lib/postgresql/wal"
POSTGRES_HOST_AUTH_METHOD: "trust"
# Enable required extensions and settings
POSTGRES_SHARED_PRELOAD_LIBRARIES: "pg_stat_statements"
POSTGRES_CUSTOM_CONF: |
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
track_io_timing = on
ports:
- "5435:5432" # 5435 to avoid conflicts
volumes:
- postgres_data:/var/lib/postgresql/data
command: >
postgres
-c shared_preload_libraries=pg_stat_statements
-c pg_stat_statements.track=all
-c pg_stat_statements.max=10000
-c track_io_timing=on
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
pghero:
image: ankane/pghero
depends_on:
postgres:
condition: service_healthy
environment:
DATABASE_URL: postgres://postgres:postgres@postgres:5432/demo
ports:
- "8080:8080"
volumes:
postgres_data:
Queries
Once the data is ingested, we can visit localhost:8080
to view PgHero, and navigate to the “Queries” tab.
Then, without optimizing anything, I performed a single:
select * from weather_record where latitude > -40 and latitude < -39 limit 5;
This took ~10 seconds as we can see in the above PgHero screenshot. It also automatically suggests creating an index on latitude
cause doing a full sequence scan on 125M rows is generally not a good ideal.
Running the suggested command, we can see the CREATE INDEX
command also show up in our Queries, taking almost 2 minutes. Doing 4 more latitude queries our average drops from 10 seconds to 2 seconds. To get a more accurate picture we can click “Reset” at the top right to reset the stats and rerun the query.
Re-running the query and we get sub 1ms response times, and as we run more queries, this fast query will drop out of view since by the default it is sorted by total time impact by query. This is a small touch, but is a huge UX improvement since as working in a DevOps role, you can quickly Reset and monitor live queries and the most impactful queries appear at the top.
If we leverage PostGIS more effectively, we would add a geography(Point)
column:
-- Add geography column
ALTER TABLE weather_record
ADD COLUMN geog geography(Point);
-- Populate it from lat/long
UPDATE weather_record
SET geog = ST_MakePoint(longitude, latitude)::geography;
-- Create index
CREATE INDEX idx_weather_record_geog
ON weather_record USING GIST(geog);
And query it using that index:
SELECT *
FROM weather_record
WHERE ST_DWithin(
geog,
ST_MakePoint(-30, 130)::geography,
50000 -- meters
)
limit 5;
The pop-under suggestions can be great, especially for running this against a production database since if an index was missed, and live queries are impacted, these can ensure a quick turn around to resolving a performance issue. In my previous role, the team I led would deploy to production multiple times a day and we all had access to PgHero. So just after a deployment we could refresh this Queries view quickly to see if there were any obvious issues.
Also when ourselves or clients reported slowness in fetching data, we could check PgHero first to find or rule out obvious issues.
Explain
We can also use the Explain
tab for both the original latitude
query and another for longitude
all from the built in web UI.
# Latitude with index present
Limit (cost=0.57..1.07 rows=5 width=96)
-> Index Scan using weather_record_latitude_idx on weather_record (cost=0.57..27799.85 rows=276180 width=96)
Index Cond: ((latitude > '-40'::double precision) AND (latitude < '-39'::double precision))
# Longitude without index present
Limit (cost=0.00..96.43 rows=5 width=96)
-> Seq Scan on weather_record (cost=0.00..3922146.20 rows=203358 width=96)
Filter: ((longitude > '-40'::double precision) AND (longitude < '-39'::double precision))
Straight away we can see the latitude query leverage the weather_record_latitude_idx
index greatly reducing the cost
of the query.
Space
Another useful tab is Space
, which as the name suggests gives you a very simple report on a per table/index/materialized view reference to storage used. And just like the Queries tab, it is automatically sorted by largest ‘impact’ (in this case GB stored) by default.
Space will also help you optimize indexes by tagging the ones that have never been used in a query.
Connections
The Connections tab is a simple view on total connections by user and location.
Here we can see a connection from PgHero itself (puma
being the Ruby web process), along with a DataGrip and psql
client connections.
Live Queries
Live Queries enables the ability to kill long running statements which can be handy if mistakes have been made. A good example I experienced was running a CREATE INDEX
statement that exceeded the server memory, forcing it to use paged memory tanking the performance. By monitoring a production Postgres instance using PgHero, where ever it might be hosted, you can at least more easily take action if things don’t go according to plan.
Maintenance
The Maintenance tab can help make it obvious why storage is not being reclaimed or optimizations not taking effect since we can see the “Last Vacuum” and “Last Analyze” columns.
Tune
This is a basic view of the current database configuration with a link to PgTune tool to give you rough ideas of options for database instance optimizations. This will be heavily dependant on your database usage characteristics like if you are read or write heavy, using shared storage or other environmental based context. Best try this kind of tuning in lower environments where possible since some changes will require instance process restarts.
If you are in a DevOps role, and looking for non-cloud tied tooling to assist with Postgres optimizations, PgHero is well work checking out!