Collections: Approving and indexing very large batches of records
Last review: July 19, 2013
Description: If you have a staging or development server, special workflows can make it easier to approve and index very large batches.
When records are added to CONTENTdm in extremely large batches (e.g., tens of thousands of records added by using Flex Loader or Catcher add-ons), it can be difficult to approve and index them using the regular workflows in CONTENTdm Administration. The Approve and Index functions in CONTENTdm Administration are intended for day-to-day use and incremental updates to collections. When bulk methods have been used to add tens of thousands of records all at once, there can be problems using CONTENTdm Administration. The regular workflows using CONTENTdm Administration will work when the Approve queue only has thousands of records.
If you have a staging or development server, the following alternate workflow is recommended as a good way to integrate large batches of records more quickly. However, note that the speed depends on highly variable and unpredictable environmental factors. And because it requires you to take the search engine offline for several hours, this method often is not a good option for a live, production server.
The approvecmd and buildcmd command-line scripts are exactly the same processes that are executed when Approve and Index are initiated in CONTENTdm Administration. However, buildcmd in particular has some command-line only options that provide a better workflow for dealing with large batches.
If a collection has tens of thousands of records in the Approve queue (compound object pages are included in the record count), the Index process can take an extremely long time to complete. This is because the index building is happening while the search engine is loaded in RAM and is available to return search results. The process is much faster if the search engine is taken offline while indexing.
The following steps outline the most efficient way to approve and index very large batches of records. PLEASE NOTE: On Linux servers, in order to avoid creating permissions problems, you must run these commands as the Apache user, not as root.
1. approvecmd /collection_alias
2. buildcmd -c /collection_alias
3. Syntax differs by platform:
Note: Search engine will be offline for the duration of this process.
4. buildcmd -i /collection_alias
The "buildcmd -c" operation completes the integration of the records into the collection without doing any indexing. The "indexALL" process rebuilds the index for all of the collections on the server. The "buildcmd -i" step creates the field vocabulary lists from the freshly built index.