Data Virtualization Cache notes

Here are some notes on Data Virtualization server caching.

  • When to use caching:
    • Query performance
    • Load optimization – to minimize load on the underlying system.
    • Consistent reporting
    • Source availability
    • Complex transformations
    • Security – to allow users to see aggregate data and still keep them from the ability to access detailed data.
  • Where to keep the cache:
    • Memory – gives you the fastest access, purged when the server restarts(?), more limited that storage in most envs
    • Database – works well for large data sets, creates and manages tables with indexes
    • File(s) – slower than memory, persistent across restarts, works well for small data sets
  • Refreshing the cache:
    • Construction refresh – refresh when its built, only once
    • Manual refresh – admin has to initiate a refresh
    • Scheduled refresh
    • Query-driven refresh – refreshed when queried if the cache has expired
    • Event-driven refresh – an event can initiate a refresh. an event could be something over an ESB, a file drop, API call, etc.
  • How to refresh the cache:
    • Incremental refresh – refresh only adds new data to the cache
    • Live refresh – basically CDC/instant refreshing
    • Full refresh – invalidate the cache and cache the data again
  • Online vs. Offline refreshing:
    • Offline refreshing – the table cannot be accessed during the refresh period
    • Online refreshing – queries CAN be executed against a table during a refresh. this data is obviously not complete though depending on the virtualization server’s implementation.
  • Cache replication:
    • It’s useful when:
      • the number of queries/users accessing the table is high
      • geographically dispersed consumers can have data cached physically closer to them for faster access
      • cache replication makes the table more available. even if one cache is down, you still have another cache to query against.
Please follow and like me:

Leave a Reply