Migrating base-UDT data byte order without a dump/reload

Suppose a database has used PL/Java-defined base types in the past, has data stored in columns of those types, and an upgrade is planned to a future PL/Java release with a changed default byte order for base types. For example, consider this table with a column of type javatest.complex (this type is one of the supplied examples).

postgres=# \d complexcircle
     Table "public.complexcircle"
 Column |       Type       | Modifiers
--------+------------------+-----------
 k      | integer          |
 z      | javatest.complex |

postgres=# SELECT * FROM complexcircle;
 k |                  z
---+--------------------------------------
 0 | (1.0,0.0)
 1 | (0.92387953251129,0.38268343236509)
 2 | (0.70710678118655,0.70710678118655)
 3 | (0.38268343236509,0.92387953251129)
 4 | (0.0,1.0)
 5 | (-0.38268343236509,0.92387953251129)
 6 | (-0.70710678118655,0.70710678118655)
 7 | (-0.92387953251129,0.38268343236509)
(8 rows)

The table holds eight equally spaced points around a circle in the complex plane.

Older versions of PL/Java always stored such data with big-endian byte order. PL/Java 1.5.0 allows byte order to be configured, but the default for PL/Java-defined base types remains big-endian. In a future release, the default will be changed to native, that is, whatever byte order the underlying hardware uses. Intel-based hardware, for example, is little-endian, and such a change would cause the stored data to be read incorrectly.

In PL/Java 1.5.0, by default, the legacy table will still be handled correctly. The control of byte order is by four Java system properties named org.postgresql.pljava.udt.byteorder.$kind.$dir where $kind is scalar or mirror and $dir is the conversion direction, p2j for data conversions from PostgreSQL to Java or j2p for the reverse. Loading a PL/Java 1.5.0 without changing the default settings, a quick query of these properties will confirm what they are:

postgres=# SELECT kind, dir, javatest.java_getsystemproperty(
  format('org.postgresql.pljava.udt.byteorder.%s.%s', kind, dir))
  FROM
    (VALUES ('scalar'), ('mirror')) AS kinds(kind),
    (VALUES ('p2j'), ('j2p')) AS directions(dir)
  ORDER BY kind;
  kind  | dir | java_getsystemproperty
--------+-----+------------------------
 mirror | p2j | native
 mirror | j2p | native
 scalar | p2j | big_endian
 scalar | j2p | big_endian
(4 rows)

Because scalar (base) types are still using the old, big-endian layout, a SELECT from the legacy table should succeed, and it does:

postgres=# SELECT * FROM complexcircle;
 k |                  z
---+--------------------------------------
 0 | (1.0,0.0)
 1 | (0.92387953251129,0.38268343236509)
 2 | (0.70710678118655,0.70710678118655)
 3 | (0.38268343236509,0.92387953251129)
 4 | (0.0,1.0)
 5 | (-0.38268343236509,0.92387953251129)
 6 | (-0.70710678118655,0.70710678118655)
 7 | (-0.92387953251129,0.38268343236509)
(8 rows)

But what happens after using \c to get a new session, and SET pljava.vmoptions to change the byte order?

postgres=# \c
You are now connected to database "postgres"
postgres=# SET pljava.vmoptions
TO '-Dorg.postgresql.pljava.udt.byteorder.scalar=native';
SET
postgres=#

Here you can see that a shortcut is possible, to set both the p2j and j2p properties by leaving the direction off. It is even possible to leave off the kind and direction, and set both conversion directions for both scalar and mirror types, by setting only one property, org.postgresql.pljava.udt.byteorder. This works only at initialization time; after the Java VM is running, these shorter-named properties do not exist, and setting them has no effect.

After setting pljava.vmoptions as above, querying the properties again should show them set accordingly, and then another query shows how that legacy table looks:

postgres=# SELECT kind, dir, javatest.java_getsystemproperty(
  format('org.postgresql.pljava.udt.byteorder.%s.%s', kind, dir))
  FROM
    (VALUES ('scalar'), ('mirror')) AS kinds(kind),
    (VALUES ('p2j'), ('j2p')) AS directions(dir)
  ORDER BY kind;
  kind  | dir | java_getsystemproperty
--------+-----+------------------------
 mirror | p2j | native
 mirror | j2p | native
 scalar | p2j | native
 scalar | j2p | native
(4 rows)
postgres=# SELECT * FROM complexcircle;
 k |                         z
---+---------------------------------------------------
 0 | (3.03865E-319,0.0)
 1 | (3.526206619982328E171,2.2885481907327986E191)
 2 | (-1.0377513410652091E170,-1.0377513410652091E170)
 3 | (2.2885481907327986E191,3.526206619982328E171)
 4 | (0.0,3.03865E-319)
 5 | (2.288548190732839E191,3.526206619982328E171)
 6 | (-1.0377513410652263E170,-1.0377513410652091E170)
 7 | (3.526206619982383E171,2.2885481907327986E191)
(8 rows)

Now the table seems to have totally bogus values in the z column. Note: as base types go, javatest.complex is quite simple, and reading it with the wrong byte order simply produces bogus values. For other Java-defined base types, other symptoms could result, such as exceptions during a query.

Migrating legacy data with a dump/reload

An obvious way to migrate the old data would be to do a (non-binary) dump, in a session where the byte-order is still the old default of big_endian, then start a new session with the order set to native, and reload the data.

Migrating data with an UPDATE

In many cases a simpler option is available, exploiting the fact that the two directions p2j and j2p can have their byte-order set independently.

You should be thinking that, for most purposes, that would be an outlandish thing to do. It can cause even the simplest queries to misbehave:

postgres=# \c
You are now connected to database "postgres"
postgres=# SET pljava.vmoptions
TO '-Dorg.postgresql.pljava.udt.byteorder.scalar.j2p=native';
SET
postgres=# SELECT kind, dir, javatest.java_getsystemproperty(
  format('org.postgresql.pljava.udt.byteorder.%s.%s', kind, dir))
  FROM
    (VALUES ('scalar'), ('mirror')) AS kinds(kind),
    (VALUES ('p2j'), ('j2p')) AS directions(dir)
  ORDER BY kind;
  kind  | dir | java_getsystemproperty
--------+-----+------------------------
 mirror | p2j | native
 mirror | j2p | native
 scalar | p2j | big_endian
 scalar | j2p | native
(4 rows)
postgres=# select '(1,2)'::javatest.complex;
         complex
--------------------------
 (3.03865E-319,3.16E-322)

Clearly not the kind of behavior you would want under any normal conditions. And again, note that for other base types, such a query might not just produce weird values, but throw errors or otherwise behave erratically.

And yet, for the one special purpose this page is about, having p2j and j2p set differently is exactly the trick. Let’s start another clean session:

postgres=# \c
You are now connected to database "postgres"
postgres=# SET pljava.vmoptions
TO '-Dorg.postgresql.pljava.udt.byteorder.scalar.p2j=big_endian '
   '-Dorg.postgresql.pljava.udt.byteorder.scalar.j2p=native';
SET
postgres=#

A quick select of a value from the column to be updated should show that it is still read correctly:

postgres=# SELECT z FROM complexcircle LIMIT 1;
     z
-----------
 (1.0,0.0)

Good, that is one of the expected, correct values. Now, what will happen with a simple command like:

postgres=# UPDATE complexcircle SET z = z; -- PostgreSQL is too smart!
UPDATE 8
postgres=# SELECT z FROM complexcircle LIMIT 1;
     z
-----------
 (1.0,0.0)

No joy: the values still look good after this conversion attempt, which is of course not good, because it means nothing useful happened. PostgreSQL looked at SET z = z and realized it didn’t require doing anything.

What’s needed here is a Java function that will take any object as a parameter and return the same object. A function, in other words, that really changes nothing, but PostgreSQL doesn’t know that, and will therefore have to convert each stored value to a Java object (p2j), convert the returned object back (j2p), and store the updated value.

It turns out that exactly such a function is already supplied in the examples:

postgres=# \df javatest.loganyelement
                             List of functions
  Schema  |     Name      | Result data type | Argument data types |  Type
----------+---------------+------------------+---------------------+--------
 javatest | loganyelement | anyelement       | anyelement          | normal
(1 row)

postgres=# UPDATE complexcircle SET z = javatest.loganyelement(z);
UPDATE 8
postgres=# SELECT z FROM complexcircle LIMIT 1;
         z
--------------------
 (3.03865E-319,0.0)
(1 row)

Encouragingly, the value now looks bad, which for this purpose is good. How does it look in another clean session, with all byte order properties set to native?

postgres=# \c
You are now connected to database "postgres"
postgres=# SET pljava.vmoptions
TO '-Dorg.postgresql.pljava.udt.byteorder=native';
SET
postgres=# SELECT kind, dir, javatest.java_getsystemproperty(
  format('org.postgresql.pljava.udt.byteorder.%s.%s', kind, dir))
  FROM
    (VALUES ('scalar'), ('mirror')) AS kinds(kind),
    (VALUES ('p2j'), ('j2p')) AS directions(dir)
  ORDER BY kind;
  kind  | dir | java_getsystemproperty
--------+-----+------------------------
 mirror | p2j | native
 mirror | j2p | native
 scalar | p2j | native
 scalar | j2p | native
(4 rows)

postgres=# SELECT * FROM complexcircle;
 k |                  z
---+--------------------------------------
 0 | (1.0,0.0)
 1 | (0.92387953251129,0.38268343236509)
 2 | (0.70710678118655,0.70710678118655)
 3 | (0.38268343236509,0.92387953251129)
 4 | (0.0,1.0)
 5 | (-0.38268343236509,0.92387953251129)
 6 | (-0.70710678118655,0.70710678118655)
 7 | (-0.92387953251129,0.38268343236509)
(8 rows)

Mission accomplished. After using that procedure to update all columns of legacy data in PL/Java-defined base types, all that remains is to make sure PL/Java will use the new byte order from that point forward. If the migration is done at the time of upgrading to a PL/Java version that makes the new order default, the job is done; if migrating in advance, a persistently-saved pljava.vmoptions (in any of the usual ways, such as ALTER DATABASE ... SET, ALTER SYSTEM, or editing postgresql.conf) will need to include the new byte order setting, which can be taken out after a later upgrade to a PL/Java version that makes the new setting the default.