I've been working on a consulting project where we're trying to use Microsoft products where it makes sense. I originally prototyped a few things in Postgres and then found myself in need of converting it over to SQL Server. After some quick googling, I discovered that most of the posts/Stack Overflow content on this topic was about going from SQL Server to Postgres, not the other way around! I did manage to find one Stack Overflow post with some helpful hints. Since the hints were all at a fairly high-level, I decided to write this post and provide some of the code I used.
Note: All of the code below was developed and tested on a Mac using ZSH (not bash). I don't think there's a difference between simple bash and zsh sed commands, so it should work fine.
The first step in the conversion process is re-creating the schema (i.e. table and field names, types). In terminal, run pg_dump on your Postgres database to get the current schema.
pg_dump --schema-only --no-owner [your-database-name] > schema_create_script.sql
SQL Server doesn't like some of the statements and syntax used by Postgres, so we're going to do some processing to strip that out before creating the new schema in SQL Server. I'll walk you through the approach and then display the code. First, we're going to add
BEGIN TRANSACTION (at the start) and
ROLLBACK TRANSACTION (at the end) to the script. This will allow you to
make a few test runs and figure out if you have any errors. At the end, once you've addressed all of the errors, you will
commment out the
ROLLBACK TRANSACTION and insert a
COMMIT TRANSACTION at the end. In your favorite editor (note: I've been
using Azure Data Studio), run the schema_create_script.sql and see what errors you get.
Next, let's go over the statement and syntax issues you will encounter:
- SET: Remove the SET statements from the top of the script. You can do this manually or use the sed commands below.
- Schema: You will also want to remove any instance of Postgres' ownership schema naming (i.e. "public.").
- Primary keys: Postgres and SQL Server treat the primary keys/unique identifiers in a table differently. Postgres can use a bigint
field type with a sequence to auto-increment. This needs to be replaced with something like
[field name] int IDENTITY PRIMARY KEY.
- Postgres uses boolean data types (true/false). SQL Server uses BIT (1/0).
And lastly, here are a few quick sed commands that can help with the schema conversion if wanted. In terminal, run the following:
sed -i '' '/^SET/ d' schema_create_script.sql sed -i '' 's/public\.//g' schema_create_script.sql
The next step is to migrate the data from Postgres over to SQL Server. In terminal, run pg_dump on Postgres to get the data.
pg_dump --data-only --column-inserts --table '[your tablename]' -d [your-database-name] > data_load_script.sql
Similar to the schema conversion, you're going to want to do some processing around the SET statements, schema naming, and other Postgres-specific statements/syntax.
sed -i '' 's/public\.//g' data_load_script.sql sed -i '' '/^SET/ d' data_load_script.sql sed -i '' '/^SELECT pg_catalog/ d' data_load_script.sql sed -i '' '1s/^/SET IDENTITY_INSERT dbo."[your tablename" ON\; /' data_load_script.sql
Lastly, I rebuilt my views from scratch rather than try to convert them. A lot of the casting from one type to another is very different. The nested casting and conversion formulas got too complex.
The biggest challenge was converting the integer-based datetime values to a date. The project I'm working on has a CRM system where the API sends the raw integers (vs a datetime). Postgres has a nice cast for this:
to_timestamp(("dateAdded"::numeric / 1000::numeric)::double precision) AS datetime_added
When I went to re-build the equivalent of this in SQL Server, I discovered that there are some dates that cause an arithmetic overflow using a similar approach:
DATEADD(s, CONVERT(bigint, dateAdded) / 1000, CONVERT(datetime, '1-1-1970 00:00:00')) AS datetime_added
So, after a lot of googling, I discovered an alternative that allowed all of the dates to convert without any overflow issues.
CAST(dateAdded / 86400000.0 + 25567 AS DATETIME) AS datetime_added