A distributed database offers an antidote
A distributed database offers an antidote to “putting all your eggs in one basket.” The key difference between a centralized and distributed database is where the information is stored. A document management system using a distributed database stores all the necessary document profiling information dispersed throughout the network. The information is stored at various points, or nodes. The storage nodes may be based on the network architecture or on disk structure. Though the data is stored in multiple physical locations, the distributed database is centrally managed. Distributing the database compartmentalizes the information, greatly reducing the chance of loosing the entire database.
A typical implementation in a DMS might allocate the profile database along the lines of the logical structure shared by the documents within the network. For example, each system Folder (directory) containing profiled documents will have a corresponding DMS data set.
The distributed data approach offers several advantages:
- There is no single point of failure (with respect to data loss)
- Updates happen close to where the work occurs and may therefore happen faster (less latency in the system)
- Certain operations, such as “cloning” profiles, can be faster
- No special procedures are required to backup and restore document profile information-it gets backed up during the standard backup routine
- The distributed approach is inherently scaleable
- A distributed database should optimize processor use as individual data sets required by processing functions will be smaller.
As with the centralized database, the nature of the distributed database is transparent to the end user. A user simply works with the document management software to save documents, retrieve documents, etc. The DMS handles whatever mediation is necessary with respect to the profile database.
There are, to be sure, some disadvantages to using a distributed database within a document management system:
- Where databases share media with the profiled documents, database resources may be unprotected (e.g. users may delete database files)
- Data synchronization must be managed by the software which requires processing overhead
- Searching across data sets can be slow.