ODM tables initialization
When an ODM table is empty (after inital creation or after schema change), it must be initialized with the current content of the corresponding DataHub entities. This must be "manually" done by calling /sql/data/initialize. When an initialization is requested, all ODM tables existing in the ODM database/schema will be emptied and refilled in the following order:
- DataHubSite
- DataHubSource
- DataHubVariable
ODM initialization operation
Every ODM table is initialized according to the following pseudo code:
DELETE FROM TABLE
foreach batch_of_1000 in GetDataHubEntities()
try bulk_insert batch_of_1000
if bulk_insert failed
foreach entity in batch_of_1000
insert entity
The ODM synchronizer will first try to bulk insert rows by batch of 1000 rows. If the bulk insert fails, the ODM synchronizer will try to individually insert every row in the the batch. This gives a chance to insert valid rows even if a few of them in the batch create insertion error. The ODM synchronizer will stop initialization and clear the table in the following circumstances:
- There is less than 1000 entities to sync and 50% of them caused insertion error
- There is more than 1000 entities to sync and 10% of a batch of 1000 entities caused insertion error
ODM initialization feedback
The ODM synchronizer will insert rows in the DataHubFeedback table (provided that the table exists), reporting insertion statistics and errors.