Timestamps and Time Zones in PostgreSQL

https://phili.pe/posts/timestamps-and-time-zones-in-postgresql/

June 19, 2016

PostgreSQL supports two types of timestamps, one with and one without time zone. Contrary to what the name suggests, timestamp with time zone does not store the time zone. The difference between the two types lies in the semantics and is often a source of confusion.

In order to understand the difference between the two timestamp types, one needs have a basic understanding of time zones. At least for me, this helped in making sense of the difference. Thus, I’ll first take a short detour and talk a bit about time and time zones before diving into times in PostgreSQL. Feel free to jump directly to another section.

The Concept of Time

Time is the indefinite continued progression of existence and events that occur in apparently irreversible succession from the past through the present to the future.

Wikipedia

I like to think of time as a continuous timeline representing the absolute notion of time.

timeline

What does absolute mean in this context? I’ll try to explain it with an example. When daylight saving changes from summer to winter time it causes the clock to be set back by one hour. We gain an additional hour on this day. Since we don’t have clocks with 25 hours, a certain hour (usually the third) is repeated, i.e., the clock will show 2:45 twice on this day.

Does that mean we traveled back in time? Of course not! Time steadily progresses along the continuous timeline. The only thing that changed is how the absolute time is mapped to the time we read on a clock, also known as wall time.

We keep track of the absolute time in a special time zone called Coordinated Universal Time, or short UTC. It does not observe daylight saving time and thus can represent the continuous notion of time.

We can denote other time zones by offsets to UTC, such as +02:00 or -10:30. Further, there are named time zones like America/New_York or Europe/Berlin. Each such named time zone defines a set of rules denoting which UTC offset applies at any given moment in time. These rules for instance encode daylight saving periods or historical adjustments of the time zone.

The time zone database, maintained by Paul Eggert, is available at IANA (or on GitHub). It contains all time zones with their rules. I encourage you to check it out. You might find some interesting historical facts about your local time zone in the comments, like this one:

One further detail for Switzerland, which is probably out of scope for most users of tzdata: The Europe/Zurich zone describes all of Switzerland correctly, with the exception of the Canton de Genève (Geneva, Genf). Between 1848 and 1894 Geneva did not follow Bern Mean Time but kept its own local mean time. To represent this, an extra zone would be needed.

To summarize, we can think of time zones as functions that uniquely map absolute time to local time, i.e. the wall time.

time zone mapping

The same goes the other way around. Knowing the time zone, we can convert any wall time to an absolute time, i.e. UTC. There is one caveat: certain wall times don’t uniquely map to an absolute time, they’re ambiguous. We can see this in the graph above where, for time zone B, certain y-values have multiple x-values. This is the case when transitioning to winter time where a determined hour is repeated.

Timestamps in PostgreSQL

The two timestamp types available in PostgreSQL are timestamp without time zone and timestamp with time zone. For brevity’s sake, I’ll be using their shorthand versions timestamp and timestamptz.

I’ve started using PostgreSQL through an ORM, specifically Active Record, the built-in ORM in Ruby on Rails. Active Record takes care of converting a timestamp from the application’s time zone to UTC before storing it in the DB. It is generally considered a good practice to store timestamps in UTC, otherwise you’ll run into problems related to daylight saving changes. Active Record uses timestamp columns to store timestamps.

Only having known timestamp, I was excited when I first encountered timestamp with time zone, or timestamptz. I thought to myself: cool, I can store a timestamp along with its time zone. So I started playing in the console:

db=# SELECT '2016-01-01 00:00+10'::timestamptz;
      timestamptz
------------------------
 2015-12-31 15:00:00+01
(1 row)

Hmm, that was not what I expected. Why didn’t it return 2016-01-01 00:00+10 as I had input? Clearly, the time zone +10 is not being stored. Instead, it is converting it to my computer’s local time zone Europe/Zurich.

So I started reading the docs and learning more about time zones. Specifically, the PostgreSQL documentation on time zones says:

All timezone-aware dates and times are stored internally in UTC.

The PostgreSQL source confirms this that timestamptz is stored as a primitive value, seemingly indistinguishable from timestamp:

#ifdef HAVE_INT64_TIMESTAMP
typedef int64 Timestamp;
typedef int64 TimestampTz;
#else
typedef double Timestamp;
typedef double TimestampTz;
#endif

With a better understanding of time zones, I realized that there is actually no point in storing a timestamp’s time zone. After all, the timestamp denotes an absolute point in time and can thus be converted from UTC back to any time zone for display purposes. That’s exactly what PostgreSQL does:

[Timezone-aware dates and times] are converted to local time in the zone specified by the TimeZone configuration parameter before being displayed to the client.

This explains why in the example before the timestamptz was rendered not in UTC, but in the time zone set by the timezone configuration:

db=# SHOW timezone;
   TimeZone
---------------
 Europe/Zurich
(1 row)

db=# SELECT '2016-01-01 00:00+10'::timestamptz;
      timestamptz
------------------------
 2015-12-31 15:00:00+01
(1 row)

Why Two Types Of Timestamps?

You might be asking yourself, what is the point of having an additional timestamp type if it is stored the same way?

The answer is semantics. Whenever you encounter a timestamptz, you know it denotes absolute time. It was stored in UTC. However, when you come across timestamp, you can’t possibly know in what time zone the timestamp is just by looking at it. It is merely a wall time. Maybe it’s UTC, or maybe the developer stored it in local time. Without additional context, you can’t be sure.

Hence, we should always be using timestamptz whenever we denote an absolute point in time.

Working With Timestamps

In this section I’ll show various things that can be done with timestamps in PostgreSQL. We’ll assume a default timezone configuration of UTC in the examples.

Changing The Timezone Configuration

Whenever I connect to a remote database and look at timestamps, I have to mentally translate them from UTC to my local time zone in order to find out how long ago something happened. I’m based in Zurich, which has a UTC offset of one or two hours, depending on the time of year, further complicating things.

As shown earlier, timestamptz values are shown in the timezone defined by the timezone configuration. This configuration can be set on a per session basis. This means that I can simply run

SET timezone TO 'Europe/Zurich';

and all timestamptz values will be printed in my time zone. I don’t have to do the mental mapping anymore.

db=# SELECT now();
              now
-------------------------------
 2016-05-29 19:20:38.505126+00
(1 row)

db=# SET timezone TO 'Europe/Zurich';
SET
db=# SELECT now();
              now
-------------------------------
 2016-05-29 21:20:56.619098+02
(1 row)

This timezone configuration has another effect. When parsing a timestamp that has no time zone designator (e.g. Z or ±hhmm), it will be assumed to be local to the currently set timezone:

db=# SET timezone TO 'US/Pacific';
SET
db=# SELECT '2016-01-01 00:00'::timestamptz;
      timestamptz
------------------------
 2016-01-01 00:00:00-08
(1 row)

db=# SELECT '2016-01-01 00:00Z'::timestamptz;
      timestamptz
------------------------
 2015-12-31 16:00:00-08
(1 row)

The first example had no designator, thus the timestamp is parsed as US/Pacific, internally converted to UTC, and then converted back to US/Pacific for display. The second example had an explicit time zone, namely Z denoting UTC.

On the other hand, when casting a string to a timestamp, any time zone information present is ignored:

db=# SELECT '2016-01-01 00:00-10'::timestamp;
      timestamp
---------------------
 2016-01-01 00:00:00
(1 row)

db=# SELECT '2016-01-01 00:00+04'::timestamp;
      timestamp
---------------------
 2016-01-01 00:00:00
(1 row)

Converting Between Timezones

There are two ways of converting a timestamp to another time zone. One is using an expression, the other a function. The function timezone(zone, timestamp) is equivalent to the SQL-conforming construct timestamp AT TIME ZONE zone.

Let’s convert the timestamp 2016-01-01 00:00 to US/Pacific:

db=# SELECT timezone('US/Pacific', '2016-01-01 00:00');
      timezone
---------------------
 2015-12-31 16:00:00
(1 row)

We get the wall time in California for 2016-01-01 00:00 UTC. Note that we passed the timestamp as a string, which was implicitly cast to a timestamptz. Since no explicit time zone information was contained in it, it was parsed in the time zone set by the timezone config (UTC in this case). Generally it is better to be explicit by specifying a time zone offset and by manually type casting, i.e., by writing '2016-01-01 00:00Z'::timestamptz.

What happens when we instead cast the string to a timestamp?

db=# SELECT timezone('US/Pacific', '2016-01-01 00:00'::timestamp);
        timezone
------------------------
 2016-01-01 08:00:00+00
(1 row)

Rather than subtracting 8 hours, as it did in the previous example, it surprisingly added 8 hours. Let’s look a bit closer at the output to understand what is going on.

In the first example, where we passed a timestamptz to the conversion function, the return value was 2015-12-31 16:00:00. In the second example, we passed a timestamp to the function, which gave us 2016-01-01 08:00:00+00. Can you spot the difference? The first output has no time zone information while the latter does.

This means that when we converted a timestamptz we got back a timestamp, whereas when we converted a timestamp, it gave us a timestamptz. In other words, in the first example we converted an absolute time to a wall time whereas in the second example we did the inverse operation, i.e. we converted a wall time back to an absolute time.

timezone conversion

In reality, there are multiple timezone functions, one that takes timestamp and one that takes timestamptz. I like to think of them as follows:

  • When passing a time zone x and a value t of type timestamptz, i.e., timezone(x, t), I’m basically saying, give me the time the clock was showing in time zone x at the absolute point in time t.

  • When passing a time zone x and a value t of type timestamp, I’m saying, give me the absolute time for when clocks in time zone x were showing t.

Thanks to the two distinct timestamp types, one denoting absolute and the other wall time, we don’t have to specify from which to which time zone we want to convert, as is the case e.g. in MySQL’s CONVERT_TZ:

mysql> SELECT CONVERT_TZ('2004-01-01 12:00:00','GMT','MET');
        -> '2004-01-01 13:00:00'

Time Zone Aware Arithmetics

A day doesn’t always have 24 hours. As noted earlier, when daylight saving changes occur, the day can have 23 or 25 hours. What if you need to get all hours for any given day in a certain time zone, e.g. to calculate hourly statistics? This is where UTC or timestamptz helps us.

We can generate all hours between two timestamp values (or timestamptz) as follows:

SELECT generate_series(
    '2016-06-01 00:00'::timestamp,
    '2016-06-01 23:00'::timestamp,
    '1 hour'::interval
);

This returns us 24 timestamps. What about days in which daylight saving changes occur? If we don’t take the time zone into account, PostgreSQL will always return 24 hours.

Let’s take as an example the US Pacific time zone. In 2016, summer time started on March 13 and winter time is going to start on November 6. In other words, March 13 had 23 hours while November 6 will have 25 hours. To get the correct number of hours, we first convert the wall times to absolute times and then generate the series. The above query thus becomes:

SELECT generate_series(
    timezone('US/Pacific', '2016-06-01 00:00'::timestamp),
    timezone('US/Pacific', '2016-06-01 23:00'::timestamp),
    '1 hour'::interval
);

Now, for June 1 this still returns 24 hours as expected. But what about March 13 and November 6?

db=# SELECT generate_series(
    timezone('US/Pacific', '2016-03-13 00:00'::timestamp),
    timezone('US/Pacific', '2016-03-13 23:00'::timestamp),
    '1 hour'::interval
);
    generate_series
------------------------
 2016-03-13 08:00:00+00
 2016-03-13 09:00:00+00
 2016-03-13 10:00:00+00
 2016-03-13 11:00:00+00
 2016-03-13 12:00:00+00
 2016-03-13 13:00:00+00
 2016-03-13 14:00:00+00
 2016-03-13 15:00:00+00
 2016-03-13 16:00:00+00
 2016-03-13 17:00:00+00
 2016-03-13 18:00:00+00
 2016-03-13 19:00:00+00
 2016-03-13 20:00:00+00
 2016-03-13 21:00:00+00
 2016-03-13 22:00:00+00
 2016-03-13 23:00:00+00
 2016-03-14 00:00:00+00
 2016-03-14 01:00:00+00
 2016-03-14 02:00:00+00
 2016-03-14 03:00:00+00
 2016-03-14 04:00:00+00
 2016-03-14 05:00:00+00
 2016-03-14 06:00:00+00
(23 rows)

db=# SELECT generate_series(
    timezone('US/Pacific', '2016-11-06 00:00'::timestamp),
    timezone('US/Pacific', '2016-11-06 23:00'::timestamp),
    '1 hour'::interval
);
    generate_series
------------------------
 2016-11-06 07:00:00+00
 2016-11-06 08:00:00+00
 2016-11-06 09:00:00+00
 2016-11-06 10:00:00+00
 2016-11-06 11:00:00+00
 2016-11-06 12:00:00+00
 2016-11-06 13:00:00+00
 2016-11-06 14:00:00+00
 2016-11-06 15:00:00+00
 2016-11-06 16:00:00+00
 2016-11-06 17:00:00+00
 2016-11-06 18:00:00+00
 2016-11-06 19:00:00+00
 2016-11-06 20:00:00+00
 2016-11-06 21:00:00+00
 2016-11-06 22:00:00+00
 2016-11-06 23:00:00+00
 2016-11-07 00:00:00+00
 2016-11-07 01:00:00+00
 2016-11-07 02:00:00+00
 2016-11-07 03:00:00+00
 2016-11-07 04:00:00+00
 2016-11-07 05:00:00+00
 2016-11-07 06:00:00+00
 2016-11-07 07:00:00+00
(25 rows)

It returned 23 distinct hours for March 13 and 25 hours for November 6.

To drive the point home, let’s repeat that with the timezone config set to US/Pacific. With this config, we can make use of the fact that a string being type cast to timestamptz will be interpreted in the current time zone set by the timezone config if the string lacks a time zone designator. Thus, we can skip the explicit wall time to absolute time conversion:

db=# SET timezone TO 'US/Pacific';
SET
db=# SELECT generate_series(
    '2016-03-13 00:00'::timestamptz,
    '2016-03-13 23:00'::timestamptz,
    '1 hour'::interval
);
    generate_series
------------------------
 2016-03-13 00:00:00-08
 2016-03-13 01:00:00-08
 2016-03-13 03:00:00-07
 2016-03-13 04:00:00-07
 2016-03-13 05:00:00-07
 2016-03-13 06:00:00-07
 2016-03-13 07:00:00-07
 2016-03-13 08:00:00-07
 2016-03-13 09:00:00-07
 2016-03-13 10:00:00-07
 2016-03-13 11:00:00-07
 2016-03-13 12:00:00-07
 2016-03-13 13:00:00-07
 2016-03-13 14:00:00-07
 2016-03-13 15:00:00-07
 2016-03-13 16:00:00-07
 2016-03-13 17:00:00-07
 2016-03-13 18:00:00-07
 2016-03-13 19:00:00-07
 2016-03-13 20:00:00-07
 2016-03-13 21:00:00-07
 2016-03-13 22:00:00-07
 2016-03-13 23:00:00-07
(23 rows)

db=# SELECT generate_series(
    '2016-11-06 00:00'::timestamptz,
    '2016-11-06 23:00'::timestamptz,
    '1 hour'::interval
);
    generate_series
------------------------
 2016-11-06 00:00:00-07
 2016-11-06 01:00:00-07
 2016-11-06 01:00:00-08
 2016-11-06 02:00:00-08
 2016-11-06 03:00:00-08
 2016-11-06 04:00:00-08
 2016-11-06 05:00:00-08
 2016-11-06 06:00:00-08
 2016-11-06 07:00:00-08
 2016-11-06 08:00:00-08
 2016-11-06 09:00:00-08
 2016-11-06 10:00:00-08
 2016-11-06 11:00:00-08
 2016-11-06 12:00:00-08
 2016-11-06 13:00:00-08
 2016-11-06 14:00:00-08
 2016-11-06 15:00:00-08
 2016-11-06 16:00:00-08
 2016-11-06 17:00:00-08
 2016-11-06 18:00:00-08
 2016-11-06 19:00:00-08
 2016-11-06 20:00:00-08
 2016-11-06 21:00:00-08
 2016-11-06 22:00:00-08
 2016-11-06 23:00:00-08
(25 rows)

With the hours printed in local time, it is easy to see at which point the daylight saving changes occurred.

A Valid Use Case For timestamp without time zone

The only time we should be using timestamp over timestamptz is when the timestamp in question doesn’t denote an absolute point in time. I could only come up with one such scenario.

Imagine a system with various customers where each customer is located in a different time zone. We might want to enqueue a task to run at a certain time, e.g. at midnight, local to each customer. In such a situation, the timestamp (e.g. 2017-01-01 00:00) would not denote a single point in time, but potentially a different one for each customer. With a customer’s time zone we are able to convert that timestamp to an absolute point in time for each customer given their respective time zone.

Here’s a simple example showcasing this:

db=# WITH
customers(id, timezone) AS (VALUES
    (1, 'Australia/Sydney'),
    (2, 'Europe/Madrid'),
    (3, 'America/Los_Angeles')),
scripts(name, trigger_at) AS (VALUES
    ('calculate_yearly_revenue', '2017-01-01'::timestamp),
    ('calculate_easter_revenue', '2016-03-28'::timestamp))
SELECT
    customers.id AS customer_id,
    scripts.name AS script_name,
    timezone(customers.timezone, scripts.trigger_at) AS trigger_at
FROM customers, scripts;
 customer_id |       script_name        |       trigger_at
-------------+--------------------------+------------------------
           1 | calculate_yearly_revenue | 2016-12-31 13:00:00+00
           2 | calculate_yearly_revenue | 2016-12-31 23:00:00+00
           3 | calculate_yearly_revenue | 2017-01-01 08:00:00+00
           1 | calculate_easter_revenue | 2016-03-27 13:00:00+00
           2 | calculate_easter_revenue | 2016-03-27 22:00:00+00
           3 | calculate_easter_revenue | 2016-03-28 07:00:00+00
(6 rows)

Notice how each scripts.trigger_at of type timestamp resulted in a different timestamptz, namely one absolute point in time for each distinct time zone. Further, notice for Madrid and Los Angeles how midnight at Easter and midnight at New Year’s Day don’t result in the same UTC hour due to daylight saving.


Timestamps in PostgreSQL can be confusing. But once you understand them, working with timestamps and time zones in PostgreSQL becomes a joy. I hope that this article convinced you of that.

If you’re interested in further material regarding time and time zones in general and in PostgreSQL, I encourage you to check out these links:

Finally, you might also be interested in learning more about using PostgreSQL on the command line.

Leave a Reply

Your email address will not be published. Required fields are marked *

Next Post

Facebook extending Markdown support for Status Updates!

Sun Mar 31 , 2019
https://blog.praveen.science/facebook-extending-markdown-support-for-status-updates/

You May Like