Creating an event-driven Lambda to query OCLC APIs

In our previous post, we discussed how to store persistent data from a serverless application. In this post, we’ll look at how serverless code is event driven and how events can be used to trigger code to interact with OCLC’s APIs.

Event-driven programming

Lambda code is executed based on events that are published to the AWS ecosystem. Several different services publish events that can be used to execute Lambda code. All Lambda code is driven by events published to the AWS ecosystem. Most services invoke Lambdas asynchronously. However, some services, such as the API Gateway and Cognito, can invoke Lambdas synchronously.

Driving Lambdas with events

When an event is published, it includes data from the service that published the event. The Lambda can then act on the data included in the event. Different services publish different data within their events. Amazon has good documentation and examples on the various event sources. Two examples of services that publish events that drive Lambda code are the API Gateway and S3.

API Gateway events are what drive the Lambda web application example discussed earlier in this series of posts. When any URL in the web application is called via the API Gateway setup as a Lambda Proxy, this passes the event data downstream to the Lambda, which executes the application code. With the application code, the event passed in is handled by the awsServerlessExpress library, which extracts out the relevant information to run the application.

In our next example of Lambda code we’re going to use events from S3 to trigger the Lambda. S3 can send events when a variety of things occur. To have a Lambda triggered when an event happens in S3, the Lambda bucket needs be configured to emit events under particular conditions. Here, the bucket emits an event when files ending .csv are added to the bucket.

file

The event emitted will contain some key bits of information including the following:

  • Event source
  • Event time
  • Event name
  • Bucket name
  • Object key

The Lambda code is structured so it acts on this data. It extracts the bucket name and the key from the event.

const bucket = event.Records[0].s3.bucket.name;
const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));

Using this information creates a filename for the output data, loads the CSV file data into a records variable, and creates a variable with an array of IDs from each row in the CSV.

var dstKey = "event_log_" + key;
try {
    let data = await s3.getObject({Bucket: bucket, Key: key}).promise();
    let records = parse(data.Body, {columns: true});
    // create comma seperated list of OCLC numbers
    let ids = records.map(record => record.oclcnumber);
// more code
} catch (Error) {
    console.log(Error, Error.stack);
    return Error;
}

The IDs variable is an array of OCLC numbers for which the application needs to lookup the current OCLC number. Next, the code converts the array to a comma-separated list and performs the API call for the lookups. Once the data is retrieved, the code loops through it and adds the new data back to the original array of “records.”

var request_config = {
    headers: {
        'Authorization': 'Bearer ' + accessToken.getAccessTokenString(),
        'Accept': 'application/atom+xml',
        'User-Agent': 'node.js KAC client'
    }
}
                    };
let url = "https://worldcat.org/bib/checkcontrolnumbers?oclcNumbers=" + ids;
try {
    let request_response = await axios.get(url, request_config);
    let doc = new dom().parseFromString(request_response.data);
    let select = xpath.useNamespaces({"atom": "http://www.w3.org/2005/Atom", "metadata": "http://worldcat.org/metadata-api-service"});
    let newIdNodes = select('//atom:content/metadata:oclcNumberRecordResult/metadata:currentOclcNumber', doc);
    let newIds = newIdNodes.map(newIdNode => newIdNode.firstChild.data);
    for (let index in records){
        records[index]['newOCLCNum'] = newIds[index];
    };
// create new CSV file
} catch (Error) {
    console.log(Error, Error.stack);
    return Error;
}


Then, a new CSV file with the name that was defined earlier is written back to the original S3 bucket.

let columns = {
    oclcnumber: "Original OCLC Number",
    newOCLCNum: "New OCLC Number"
};
let csv_string = stringify(records, {header: true, columns: columns});
                    
try {
    let result = await s3.putObject({Bucket: bucket, Key: dstKey, Body: csv_string}).promise();
    console.log('success')
    return { status: 'success' }  
} catch (Error) {
    console.log(Error, Error.stack);
    return Error;
}

The final source code for this application is available at: https://github.com/OCLC-Developer-Network/code4lib_serverless_triggered.

Next steps

In this post, we’ve developed a basic understanding of how Lambdas can execute based on an input event. We created a Lambda that queries the OCLC Metadata API for a current OCLC number based on a spreadsheet in an S3 bucket and adds it to a spreadsheet with the updated OCLC number information. In our next post, we’ll look at how Lambda code can be run on a schedule.

 

  • Karen Coombs

    Karen Coombs

    Senior Product Analyst