Skip to page content

Asia Pacific (English) Change

Harvest, Ingest, and Disseminate

Harvest content for a digital archive record

Harvest content for a digital archive record

Important. Your digital archive record must contain a URL (in the ObjectLocator field of the digital archive record) from which to harvest content.

Follow the steps below to harvest content from the URL in the digital archive record.

  Action
1 Display a digital archive record.
From a results list. Click the title of the record.
The record displays.
2 On the Actions list, click Harvest Document.
The system responds Select Yes to submit harvest request to the Digital Archive Harvester for the URL: [URL from your DA record]. Select No to return to the Edit page.
3 Click Yes.
The Harvest Properties screen appears.
4 Depth. For Current Path, depth is the number of directory levels below the entry point. For All Links, depth is the number of links away from the entry point.
Method.
  • Current Path. Harvests only linked content in subdirectories below the entry point.
  • All Links. Harvests all linked content in the domain, regardless of subdirectory.
Optional. Preview. Allows you to view files before harvesting.
Check the Preview box.

Click Submit.
Note: If you are harvesting a PDF file, Depth and Method are not available.
5 The Harvest Preview Process screen appears briefly while files are being prepared for harvest preview.
The Harvest Preview screen automatically appears.
6 Preview Statistics. Summarizes the type, number, and size (in bytes) of files to be harvested.

Harvest Properties. The properties you selected. Click Change Harvest Properties to change them and preview again.

Harvest Queue. Total number of files and their size in bytes. Click Harvest to harvest them. Click View Queue for more detailed file information.

Preview.
  • Included Files. Click to view a list of the files to be harvested and their paths (URLs).
  • Excluded Files. Click to view the files that will not be harvested and their paths (URLs).
7 Update files in the Harvest Queue (optional).

The Harvest Properties (depth and method) you chose determine the files contained in the Harvest Queue. Files optional to the harvest (i.e., other than the original HTML file and those needed to display it, such as GIFs) are listed as Included Files and are automatically checked.

Add or remove files. Check the box next to the file you want to add. Uncheck the box next to the file you want to remove.

Click Update to send your changes to the Included File list to the system. The screen refreshes to reflect your changes.

Click Select All to add all files under Included Files.
Click Clear All to remove all checked files under Included Files.
Click Exit to exit the harvester and return to the digital archive record.
Click Reset to go back to the last time you clicked Update.
8 Harvest Files.

Click Harvest. The Harvest In-Process screen appears while harvesting.
9 Click Close to exit the harvester and return to the digital archive record.

Note: Closing the harvester does not stop the harvest process. You can exit and do other work while harvesting occurs.

Harvest Properties

You begin harvesting by choosing an entry point from which the harvester starts gathering content. The entry point is the URL in the ObjectLocator field. If files are not linked from or to other files, the harvester will not find them. The harvester does not collect content outside the entry point domain (A domain is what is between the http:// and the first slash in a URL). There are two methods of harvesting:

  • All Links
  • Current Path

All Links. From the entry point, the harvester follows all links, collecting content from anywhere within the domain. You specify how many links away from the entry point the harvester goes by using the depth setting.

Current Path. Sometimes All Links harvests too much content. Use the Current Path method to limit your harvest based on the directory structure of the web site. From the entry point, the harvester moves through subdirectories collecting the linked content they contain. You specify how many subdirectories down from the entry point the harvester goes by using the depth setting.

back to top

Ingest content to digital archive

Ingest content to digital archive

Important. Your digital archive record must contain the data in the fields below or it will fail validation when you try to ingest.

  • Title
  • ObjectLocator
  • DALanguage

Follow the steps below to harvest content from the web site of the URL in the digital archive record.

Important: Any records with a status of Harvest Complete can be ingested.

  Action
1 Edit a digital archive record.
From a results list. Click the Edit button next to the title of the record.
The record displays in Edit mode.
2 On the Actions list, click Ingest Document.
The system responds Select Yes to submit ingest request to the Digital Archive with the following selections. Select No to return to the Edit page.
3 The default ingest properties are displayed. If you want to change these properties, select properties from the drop-down lists:

Service Level. Select service level from list.

  • Bit Preservation. Object is stored in the OCLC Digital Archive. This is the default setting.

  • Local. Object is to be disseminated from the OCLC Digital Archive for storage in a local archive.

Rights Statement. Select copyright statement from list.


Content Group. Select content group from list.


Authorization Group. Select authorization group from list.


Note: You can change these properties after ingest by using the Digital Archive Administration Module at http://digitalarchive.oclc.org/admin/.

4 Click Yes.
The system responds Ingest Complete. Object and Digital Archive record moved from the save file to the Digital Archive.

back to top

Disseminate content from the digital archive

Note on workflow

In order to disseminate an object, you must have ingested it into the Digital Archive. However, the Disseminate Document action is available only in the digital archive save file. You must retrieve the record from the Digital Archive and resave it to the save file before you can disseminate it.

After dissemination, delete the record from the save file so you can edit it in the Digital Archive.

Disseminate a digital archive record

  Action
1 Retrieve a digital archive record.
From a results list. Click the title of the record.
The record displays.
2 On the Actions list, click Save Record.

Select a status from the list and click Yes.

The system confirms that the record has been saved to the save file.
3 On the Actions list, click Disseminate Document.

The Disseminate Options screen appears.
Note: Disseminate document is available only in display mode.
4 PDF file. Go to step 5.
HTML file. Select a link option:

Original links. Links point to original file locations on the internet.
Relative links. Links point to digital archive file locations.
5 Click Disseminate.
Click Cancel to exit the dissemination process.

The Disseminate In-Process screen appears.
6 Click Exit.
The Disseminate In-Process screen closes.

Dissemination Information Package (DIP)

When you click Disseminate Document, you create a Dissemination Information Package (DIP), which contains the object you archived and its metadata. DIPs use the Metadata Encoding and Transmission Standard (METS).

For more information on METS see: http://www.loc.gov/standards/mets/.

The DIP is contained in a ZIP file.

Download Dissemination Information Package (DIP)

DIPs are deleted from the Dissemination Manager after 90 days. The object and metadata remain in the Digital Archive.

  Action
1 On the Digital Archive tab, under Show, click Dissemination Manager.
The Dissemination Manager screen appears.
2 Click the file name.
The File Download box appears.
3 Select Save this file to disk. Click OK.
The Save as box appears.
4 Select the folder you want to save the ZIP file in.

Click Save. The ZIP file downloads to your computer.

Click Close. The Dissemination Manager screen closes.
5 Delete the record from the save file.

On the Actions list, click Delete from Save File.

Note: Deleting the record from the save file allows you to edit it in the Digital Archive.

back to top