Integrating webhooks

Learn how to integrate webhooks into your Actors. Webhooks are a super powerful tool, and can be used to do almost anything!

In this lesson we'll be writing a new Actor and integrating it with our beloved Amazon scraping Actor. First, we'll navigate to the same directory where our demo-actor folder lives, and run apify create filter-actor (once again, you can name the Actor whatever you want, but for this lesson, we'll be calling the new Actor filter-actor). When prompted about the programming language, select JavaScript:

$ apify create filter-actor
? Choose the programming language of your new Actor:
❯ JavaScript
  TypeScript
  Python

Then use the arrow down key to select Empty JavaScript Project:

$ apify create filter-actor
✔ Choose the programming language of your new Actor: JavaScript
? Choose a template for your new Actor. Detailed information about the template will be shown in the next step.
  Crawlee + Playwright + Chrome
  Crawlee + Playwright + Camoufox
  Bootstrap CheerioCrawler
  Cypress
❯ Empty JavaScript Project
  Standby JavaScript Project
  ...

As a last step, confirm the choices by Install template and wait until our new Actor is ready.

Building the new Actor

First of all, we should clear out any of the boilerplate code within main.js to get a clean slate:

// main.js
import { Actor } from 'apify';

await Actor.init();

// ...

await Actor.exit();

We'll be passing the ID of the Amazon Actor's default dataset along to the new Actor, so we can expect that as an input:

const { datasetId } = await Actor.getInput();
const dataset = await Actor.openDataset(datasetId);
// ...

tip

You will need to use forceCloud option - Actor.openDataset(<name/id>, { forceCloud: true }); - to open dataset from platform storage while running Actor locally.

Next, we'll grab hold of the dataset's items with the dataset.getData() function:

const { items } = await dataset.getData();

While several methods can achieve the goal output of this Actor, using the Array.reduce() is the most concise approach

const filtered = items.reduce((acc, curr) => {
    // Grab the price of the item matching our current
    // item's ASIN in the map. If it doesn't exist, set
    // "prevPrice" to null
    const prevPrice = acc?.[curr.asin] ? +acc[curr.asin].offer.slice(1) : null;

    // Grab the price of our current offer
    const price = +curr.offer.slice(1);

    // If the item doesn't yet exist in the map, add it.
    // Or, if the current offer's price is less than the
    // saved one, replace the saved one
    if (!acc[curr.asin] || prevPrice > price) acc[curr.asin] = curr;

    // Return the map
    return acc;
}, {});

The results should be an array, so we can take the map we just created and push an array of its values to the Actor's default dataset:

await Actor.pushData(Object.values(filtered));

Our final code looks like this:

import { Actor } from 'apify';

await Actor.init();

const { datasetId } = await Actor.getInput();
const dataset = await Actor.openDataset(datasetId);

const { items } = await dataset.getData();

const filtered = items.reduce((acc, curr) => {
    const prevPrice = acc?.[curr.asin] ? +acc[curr.asin].offer.slice(1) : null;
    const price = +curr.offer.slice(1);

    if (!acc[curr.asin] || prevPrice > price) acc[curr.asin] = curr;

    return acc;
}, {});

await Actor.pushData(Object.values(filtered));

await Actor.exit();

Cool! But wait, don't forget to configure the INPUT_SCHEMA.json file as well! It's not necessary to do this step, as we'll be calling the Actor through Apify's API within a webhook, but it's still good to get into the habit of writing quality input schemas that describe the input values your Actors are expecting.

{
    "title": "Amazon Filter Actor",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "datasetId": {
            "title": "Dataset ID",
            "type": "string",
            "description": "Enter the ID of the dataset.",
            "editor": "textfield"
        }
    },
    "required": ["datasetId"]
}

Now we're done, and we can push it up to the Apify platform with the apify push command:

$ apify push
Info: Created Actor with name filter-actor on Apify.
Info: Deploying Actor 'filter-actor' to Apify.
Run: Updated version 0.0 for Actor filter-actor.
Run: Building Actor filter-actor
(timestamp) ACTOR: Extracting Actor documentation from README.md
(timestamp) ACTOR: Building Docker image.
...
(timestamp) ACTOR: Pushing Docker image to repository.
(timestamp) ACTOR: Build finished.
Actor build detail https://console.apify.com/actors/Yk1bieximsduYDydP#/builds/0.0.1
Actor detail https://console.apify.com/actors/Yk1bieximsduYDydP
Success: Actor was deployed to Apify cloud and built there.

Setting up the webhook

We'll use the Apify API to set up the webhook. To compose the HTTP request, we'll need either the ID of our Actor or its technical name. Let's take a second look at the end of the output of the apify push command:

...
Actor build detail https://console.apify.com/actors/Yk1bieximsduYDydP#/builds/0.0.1
Actor detail https://console.apify.com/actors/Yk1bieximsduYDydP
Success: Actor was deployed to Apify cloud and built there.

The URLs tell us that our Actor's ID is Yk1bieximsduYDydP. With this actorId, and our token, which is retrievable through Settings > Integrations on the Apify Console, we can construct a link which will call the Actor:

https://api.apify.com/v2/acts/Yk1bieximsduYDydP/runs?token=YOUR_TOKEN_HERE

We can also use our username and the name of the Actor like this:

https://api.apify.com/v2/acts/USERNAME~filter-actor/runs?token=YOUR_TOKEN_HERE

Whichever one you choose is totally up to your preference.

Next, within the Amazon scraping Actor, we will click the Integrations tab and choose Webhook, then fill out the details to look like this:

Configuring a webhook

We have chosen to run the webhook once the Actor has succeeded, which means that its default dataset will surely be populated. Since the filtering Actor is expecting the default dataset ID of the Amazon Actor, we use the resource variable to grab hold of the defaultDatasetId.

Click Save, then run the Amazon demo-actor again.

Making sure it worked

If everything worked, then at the end of the demo-actor's run, we should see this within the Integrations tab:

Webhook succeeded

Additionally, we should be able to see that our filter-actor was run, and have access to its dataset:

Dataset preview

Quiz answers 📝

Q: How do you allocate more CPU for an Actor's run?

A: On the platform, more memory can be allocated in the Actor's input configuration, and the default allocated CPU can be changed in the Actor's Settings tab. When running locally, you can use the APIFY_MEMORY_MBYTES environment variable to set the allocated CPU. 4GB is equal to 1 CPU core on the Apify platform.

Q: Within itself, can you get the exact time that an Actor was started?

A: Yes. The time the Actor was started can be retrieved through the startedAt property from the Actor.getEnv() function, or directly from process.env.APIFY_STARTED_AT

Q: What are the types of default storages connected to an Actor's run?

Every Actor's run is given a default key-value store and a default dataset. The default key-value store by default has the INPUT and OUTPUT keys. The Actor's request queue is also stored.

Q: Can you change the allocated memory of an Actor while it's running?

A: Not while it's running. You'd need to stop it and run a new one. However, there is an option to soft abort an Actor, then resurrect then run with a different memory configuration.

Q: How can you run an Actor with Puppeteer on the Apify platform with headless mode set to false?

A: This can be done by using the actor-node-puppeteer-chrome Docker image and making sure that launchContext.launchOptions.headless in PuppeteerCrawlerOptions is set to false.

Wrap up

See that?! Integrating webhooks is a piece of cake on the Apify platform! You'll soon discover that the platform factors away a lot of complex things and allows you to focus on what's most important - developing and releasing Actors.

Building the new Actor​

Setting up the webhook​

Making sure it worked​

Quiz answers 📝​

Wrap up​

Building the new Actor

Setting up the webhook

Making sure it worked

Quiz answers 📝

Wrap up