This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: Databases

Synchronize your databases with SqlSync

By Ben Martin on May 08, 2008 (4:00:00 PM)

Share    Print    Comments   

SqlSync lets you compare two databases to see which tuples have been added, removed, and changed. You can also use SqlSync to make one database a clone of another and maintain its contents to be that way. One benefit of using SqlSync to perform synchronization is that you can perform heterogeneous syncs -- for example, from MySQL to PostgreSQL.

No SqlSync packages are available in the standard repositories for Ubuntu, Fedora, or openSUSE. For the article I'll build from source using version 1.0.0-rc1 on a Fedora 8 64-bit machine. Although it is not mentioned on the project's freshmeat page or home page, SqlSync uses ODBC to access your databases. If you do not have the unixODBC development packages installed, the build will fail.

SqlSync doesn't use autotools to build itself, relying on Makefiles instead. The Makefile.log might hint at a reason if the build fails. I found that the build broke complaining about trying to make a shared library without the -fPIC position-independent code option. PIC code has the advantage that it can be loaded at different locations in memory; this is useful for compiling code for shared libraries, because they can be moved when multiple libraries would otherwise want the same address. When I attempted to build the static linked version I ran into another problem with linking. To solve it, I modified the Makefile.def file as shown below and ran make as shown to build SqlSync. This produces a statically linked tool src/sqlsync_static.

$ vi Makefile.def - ODBC_STATIC_LIB=`odbc_config --static-libs` -L`odbc_config --lib-prefix` -liconv + ODBC_STATIC_LIB=`odbc_config --static-libs` -L`odbc_config --lib-prefix` -ldl $ make WITH_SHARED=NO WITH_STATIC=YES clean all ...

The primary options to the sqlsync command specify the source and destination ODBC Data Source Name (DSN), any restriction on the tables and columns you want to compare, and whether you want to only show the queries that would perform the sync or actually run the sync as well. An ODBC DSN is a name that identifies a database, what driver to use, and other configuration parameters. If you run sqlsync without specifying to either show the queries or to update the destination database, it will show you a summary of the changes.

For initial testing I'll use two PostgreSQL databases: sourcedb and destinationdb. The initial database setup is shown below. Note that the database schema must exist in destinationdb prior to running sqlsync. The final command shown below will dump only the schema from sourcedb into destinationdb, meaning the tables will exist in destinationdb but there will be no tuples in destinationdb.

$ psql ben=# create database sourcedb; CREATE DATABASE ben=# create database destinationdb; CREATE DATABASE ben=# \c sourcedb sourcedb=# create table customers ( cid serial primary key, name varchar(100) ) ; sourcedb=# create table info ( id serial primary key, cid int references customers(cid), note text ); sourcedb=# insert into customers values ( default, 'fred' ); sourcedb=# insert into customers values ( default, 'john' ); sourcedb=# insert into customers values ( default, 'peter' ); sourcedb=# insert into info values ( default, (select cid from customers where name='fred'), 'fred is home' ); sourcedb=# insert into info values ( default, (select cid from customers where name='john'), 'john is with fred' ); sourcedb=# insert into info values ( default, (select cid from customers where name='fred'), 'a second note for him' ); sourcedb=# \q $ pg_dump -c -s sourcedb | psql destinationdb

To make things clear, I'll use the same ODBC DSN as the postgresql database name. You can set up ODBC DSNs with the ODBCConfig graphical client. First I show a summary of the differences between the two ODBC DSNs, followed by the SQL queries that would bring the destinationdb into the same state as the sourcedb, using the sqlsync --show-queries option.

$ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --show-summary Connected to dsn_src server sourcedb Connected to dsn_dst server destinationdb All available tables will be synchronized User didn't specify list of tables to synchronize Checking availability of specified tables Comparing tables customers Comparing tables customers completed [BEGIN] ----------------------------- [SUMMARY] --------------------------- Table customers Number of columns 2 Equal rows 0 Different rows 0 Missing rows 3 Additional rows 0 [COLUMNS] --------------------------- cid| name| [DIFF] ------------------------------ [MISSING] --------------------------- 1| fred| 2| john| 3| peter| [ADDITIONAL] ------------------------ [END] ------------------------------- Comparing tables info Comparing tables info completed [BEGIN] ----------------------------- [SUMMARY] --------------------------- Table info Number of columns 3 Equal rows 0 Different rows 0 Missing rows 3 Additional rows 0 [COLUMNS] --------------------------- id| cid| note| [DIFF] ------------------------------ [MISSING] --------------------------- 1| 1| fred is home| 2| 2| john is with fred| 3| 1| a second note for him| [ADDITIONAL] ------------------------ [END] ------------------------------- Comparing tables sql_parts Error occured while running SELECT query on source table, probably generated query is corrupted for sql_parts table [unixODBC]Error while executing the query (non-fatal); ERROR: relation "sql_parts" does not exist Comparing tables sql_parts completed [BEGIN] ----------------------------- [SUMMARY] --------------------------- Table sql_parts ... [END] ------------------------------- $ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --show-queries Connected to dsn_src server sourcedb Connected to dsn_dst server destinationdb ... Creating synchronization queries INSERT INTO "customers" ("cid", "name") VALUES (1, 'fred'); INSERT INTO "customers" ("cid", "name") VALUES (2, 'john'); INSERT INTO "customers" ("cid", "name") VALUES (3, 'peter'); Creating synchronization queries completed Comparing tables info Comparing tables info completed Creating synchronization queries INSERT INTO "info" ("id", "cid", "note") VALUES (1, 1, 'fred is home'); INSERT INTO "info" ("id", "cid", "note") VALUES (2, 2, 'john is with fred'); INSERT INTO "info" ("id", "cid", "note") VALUES (3, 1, 'a second note for him'); Creating synchronization queries completed ...

The --run-queries command will execute the queries that you saw with the --show-queries option to bring the destination database into the same state as the source database. After this sync has been performed, I add a new tuple into the source database and destination database. The final command shows that sqlsync will add the new tuple from the source database and delete the new tuple that I added to the destination database.

$ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --run-queries Connected to dsn_src server sourcedb Connected to dsn_dst server destinationdb ... Creating synchronization queries Running UPDATE synchronization queries Running DELETE synchronization queries Running INSERT synchronization queries Running synchronization query (0/3) ... $ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --show-queries Connected to dsn_src server sourcedb Connected to dsn_dst server destinationdb ... Creating synchronization queries Creating synchronization queries completed Comparing tables info Comparing tables info completed Creating synchronization queries Creating synchronization queries completed ... $ psql sourcedb sourcedb=# insert into info values ( default, (select cid from customers where name='john'), 'another' ); sourcedb=# \c destinationdb destinationdb=# insert into info (cid, note) values ( (select cid from customers where name='peter'), 'this will be gone' ); destinationdb=# \q $ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --show-queries ... Creating synchronization queries DELETE FROM "info" WHERE "id"=20; INSERT INTO "info" ("id", "cid", "note") VALUES (4, 2, 'another'); Creating synchronization queries completed

Next, I added a new tuple in the info table with a foreign key reference to the customers table, then synced it with sqlsync.

$ psql sourcedb sourcedb=# insert into customers values ( default, 'luke' ); sourcedb=# insert into info values ( default, (select cid from customers where name='luke'), 'lucky' ); sourcedb=# \q $ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --show-queries ... INSERT INTO "customers" ("cid", "name") VALUES (4, 'luke'); ... DELETE FROM "info" WHERE "id"=20; INSERT INTO "info" ("id", "cid", "note") VALUES (4, 2, 'another'); INSERT INTO "info" ("id", "cid", "note") VALUES (5, 4, 'lucky'); $ sqlsync_static --src-dsn sourcedb --dst-dsn destinationdb --run-queries

Working across DBMS engines

To test out the cross-database synchronization, I created a MySQL database to sync into called destinationdb, with an ODBC DSN called mysql-destinationdb. You can use the isql command from unixODBC to connect to a database through ODBC, which comes in handy if you are troubleshooting ODBC access issues. Instead of using the serial data type for the MySQL database schema, I just use a plain integer type. I used the user and password options in order to connect to the MySQL database DSN as shown in the sqlsync command below.

$ isql mysql-destinationdb user passwd SQL> create table customers ( cid int primary key, name varchar(100) ) ; SQL> create table info ( id int primary key, cid int references customers(cid), note text ); SQL> quit $ sqlsync_static --src-dsn sourcedb \ --dst-dsn mysql-destinationdb \ --dst-user user --dst-password passwd \ --show-queries ... Creating synchronization queries INSERT INTO `customers` (`cid`, `name`) VALUES (1, 'fred'); INSERT INTO `customers` (`cid`, `name`) VALUES (2, 'john'); INSERT INTO `customers` (`cid`, `name`) VALUES (3, 'peter'); INSERT INTO `customers` (`cid`, `name`) VALUES (5, 'luke'); ... INSERT INTO `info` (`id`, `cid`, `note`) VALUES (1, 1, 'fred is home'); INSERT INTO `info` (`id`, `cid`, `note`) VALUES (2, 2, 'john is with fred'); INSERT INTO `info` (`id`, `cid`, `note`) VALUES (3, 1, 'a second note for him'); INSERT INTO `info` (`id`, `cid`, `note`) VALUES (4, 2, 'another'); INSERT INTO `info` (`id`, `cid`, `note`) VALUES (6, 5, 'lucky');

You can restrict which tables and which columns from these tables will be synced using the --tables option to sqlsync. For example, I could sync only the name and note field using --tables customers[name],info[note]. To test this out I created a new database where the info table had only the id and note field. The sync is shown below.

The ability to pick out only some columns for syncing can be useful, but you cannot set up a table name mapping, for example syncing the info table into a limitedinfo table in the destination. There is nothing precluding such functionality being added to SqlSync in the future.

$ psql ben# create database partdestdb ben# \c partdestdb partdestdb# create table info ( id serial primary key, note text ); partdestdb# \q $ sqlsync_static --src-dsn sourcedb --dst-dsn partdestdb --tables info[id,note] --show-queries Parsing tables and columns passed from command line Parsing tables and columns completed Connected to dsn_src server sourcedb Connected to dsn_dst server partdestdb Checking availability of specified tables Comparing tables info Comparing tables info completed Creating synchronization queries INSERT INTO "info" ("id", "note") VALUES (1, 'fred is home'); INSERT INTO "info" ("id", "note") VALUES (2, 'john is with fred'); INSERT INTO "info" ("id", "note") VALUES (3, 'a second note for him'); INSERT INTO "info" ("id", "note") VALUES (4, 'another'); INSERT INTO "info" ("id", "note") VALUES (6, 'lucky'); Creating synchronization queries completed

SqlSync might be worth considering if you maintain a backup of a database. Although many databases provide tools for replication, using SqlSync lets you back up the database using a different database server. This could be useful if you are migrating between two database engines and wish to maintain the database on two different database servers while you are testing.

While SqlSync is handy, it's not perfect. The ability to attempt to set up a schema in the destination database automatically would be a welcome addition.

Ben Martin has been working on filesystems for more than 10 years. He completed his Ph.D. and now offers consulting services focused on libferris, filesystems, and search solutions.

Share    Print    Comments   

Comments

on Synchronize your databases with SqlSync

There are no comments attached to this item.

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya