Scrape with JS Rendering with ScrapeNinja API on Changes to Specific Files (Shared Drive) from Google Drive API

Pipedream makes it easy to connect APIs for ScrapeNinja, Google Drive and 3,000+ other apps remarkably fast.

Trigger workflow on

Changes to Specific Files (Shared Drive) from the Google Drive API

Next, do this

Scrape with JS Rendering with the ScrapeNinja API

No credit card required

▶

Watch us build a workflow

8 min

Watch now ➜

Trusted by 1,000,000+ developers from startups to Fortune 500 companies

Developers ♥ Pipedream

Getting Started#

This integration creates a workflow with a Google Drive trigger and ScrapeNinja action. When you configure and deploy the workflow, it will run on Pipedream's servers 24x7 for free.

Select this integration
Configure the Changes to Specific Files (Shared Drive) trigger
1. Connect your Google Drive account
2. Select a Drive
3. Configure Push notification renewal schedule
4. Select one or more Files
5. Optional- Configure Include Link
6. Optional- Configure dir
7. Configure intervalAlert
8. Optional- Configure Minimum Interval Per File
Configure the Scrape with JS Rendering action
1. Connect your ScrapeNinja account
2. Configure URL
3. Optional- Configure Wait For Selector
4. Optional- Configure Post Wait Time
5. Optional- Configure Dump Iframe
6. Optional- Configure Wait For Selector Iframe
7. Optional- Configure Extractor Target Iframe
8. Optional- Configure Headers
9. Optional- Configure Retry Number
10. Optional- Configure Geo
11. Optional- Configure Proxy
12. Optional- Configure Timeout
13. Optional- Configure Text Not Expected
14. Optional- Configure Status Not Expected
15. Optional- Configure Block Images
16. Optional- Configure Block Media
17. Optional- Configure Screenshot
18. Optional- Configure Catch Ajax Headers URL Mask
19. Optional- Configure Viewport Width
20. Optional- Configure Viewport Height
21. Optional- Configure Viewport Device Scale Factor
22. Optional- Configure Viewport Has Touch
23. Optional- Configure Viewport Is Mobile
24. Optional- Configure Viewport Is Landscape
25. Optional- Configure Extractor
Deploy the workflow
Send a test event to validate your setup
Turn on the trigger

Details#

This integration uses pre-built, source-available components from Pipedream's GitHub repo. These components are developed by Pipedream and the community, and verified and maintained by Pipedream.

To contribute an update to an existing component or create a new component, create a PR on GitHub. If you're new to Pipedream component development, you can start with quickstarts for trigger span and action development, and then review the component API reference.

Trigger#

Changes to Specific Files (Shared Drive) on Google Drive

Description:Watches for changes to specific files in a shared drive, emitting an event when a change is made to one of those files

Version:0.3.2

Key:google_drive-changes-to-specific-files-shared-drive

View on GitHub

Google Drive Overview#

The Google Drive API on Pipedream allows you to automate various file management tasks, such as creating, reading, updating, and deleting files within your Google Drive. You can also share files, manage permissions, and monitor changes to files and folders. This opens up possibilities for creating workflows that seamlessly integrate with other apps and services, streamlining document handling, backup processes, and collaborative workflows.

Trigger Code#

// This source processes changes to specific files in a user's Google Drive,
// implementing strategy enumerated in the Push Notifications API docs:
// https://developers.google.com/drive/api/v3/push .
//
// This source has two interfaces:
//
// 1) The HTTP requests tied to changes in files in the user's Google Drive
// 2) A timer that runs on regular intervals, renewing the notification channel as needed

import common from "../common-webhook.mjs";
import sampleEmit from "./test-event.mjs";

import {
  GOOGLE_DRIVE_NOTIFICATION_ADD,
  GOOGLE_DRIVE_NOTIFICATION_CHANGE,
  GOOGLE_DRIVE_NOTIFICATION_UPDATE,
} from "../../common/constants.mjs";
import commonDedupeChanges from "../common-dedupe-changes.mjs";
import { stashFile } from "../../common/utils.mjs";

/**
 * This source uses the Google Drive API's
 * {@link https://developers.google.com/drive/api/v3/reference/changes/watch changes: watch}
 * endpoint to subscribe to changes to the user's drive or a shard drive.
 */
export default {
  ...common,
  key: "google_drive-changes-to-specific-files-shared-drive",
  name: "Changes to Specific Files (Shared Drive)",
  description: "Watches for changes to specific files in a shared drive, emitting an event when a change is made to one of those files",
  version: "0.3.2",
  type: "source",
  // Dedupe events based on the "x-goog-message-number" header for the target channel:
  // https://developers.google.com/drive/api/v3/push#making-watch-requests
  dedupe: "unique",
  props: {
    ...common.props,
    files: {
      type: "string[]",
      label: "Files",
      description: "The files you want to watch for changes.",
      options({ prevContext }) {
        const { nextPageToken } = prevContext;
        return this.googleDrive.listFilesOptions(nextPageToken, this.getListFilesOpts());
      },
    },
    includeLink: {
      label: "Include Link",
      type: "boolean",
      description: "Upload file to your File Stash and emit temporary download link to the file. Google Workspace documents will be converted to PDF. See [the docs](https://pipedream.com/docs/connect/components/files) to learn more about working with files in Pipedream.",
      default: false,
      optional: true,
    },
    dir: {
      type: "dir",
      accessMode: "write",
      optional: true,
    },
    ...commonDedupeChanges.props,
  },
  hooks: {
    async deploy() {
      const daysAgo = new Date();
      daysAgo.setDate(daysAgo.getDate() - 30);
      const timeString = daysAgo.toISOString();

      const args = this.getListFilesOpts({
        q: `mimeType != "application/vnd.google-apps.folder" and modifiedTime > "${timeString}" and trashed = false`,
        fields: "files",
        pageSize: 5,
      });

      const { files } = await this.googleDrive.listFilesInPage(null, args);

      await this.processChanges(files);
    },
    ...common.hooks,
  },
  methods: {
    ...common.methods,
    getUpdateTypes() {
      return [
        GOOGLE_DRIVE_NOTIFICATION_ADD,
        GOOGLE_DRIVE_NOTIFICATION_CHANGE,
        GOOGLE_DRIVE_NOTIFICATION_UPDATE,
      ];
    },
    generateMeta(data, headers) {
      const {
        id: fileId,
        name: fileName,
        modifiedTime: tsString,
      } = data;
      const ts = Date.parse(tsString);
      const resourceState = headers && headers["x-goog-resource-state"];

      const summary = resourceState
        ? `${resourceState.toUpperCase()} - ${fileName || "Untitled"}`
        : fileName || "Untitled";

      return {
        id: `${fileId}-${ts}`,
        summary,
        ts,
      };
    },
    isFileRelevant(file) {
      return this.files.includes(file.id);
    },
    getChanges(headers) {
      if (!headers) {
        return {
          change: { },
        };
      }
      return {
        change: {
          state: headers["x-goog-resource-state"],
          resourceURI: headers["x-goog-resource-uri"],
          changed: headers["x-goog-changed"], // "Additional details about the changes. Possible values: content, parents, children, permissions"
        },
      };
    },
    async processChange(file, headers) {
      const changes = this.getChanges(headers);
      const fileInfo = await this.googleDrive.getFile(file.id);
      if (this.includeLink) {
        fileInfo.file = await stashFile(file, this.googleDrive, this.dir);
      }
      const eventToEmit = {
        file: fileInfo,
        ...changes,
      };
      const meta = this.generateMeta(fileInfo, headers);
      this.$emit(eventToEmit, meta);
    },
    async processChanges(changedFiles, headers) {
      console.log(`Processing ${changedFiles.length} changed files`);
      console.log(`Changed files: ${JSON.stringify(changedFiles, null, 2)}!!!`);
      console.log(`Files: ${this.files}!!!`);

      const filteredFiles = this.checkMinimumInterval(changedFiles);
      for (const file of filteredFiles) {
        if (!this.isFileRelevant(file)) {
          console.log(`Skipping event for irrelevant file ${file.id}`);
          continue;
        }
        await this.processChange(file, headers);
      }
    },
  },
  sampleEmit,
};

Trigger Configuration#

This component may be configured based on the props defined in the component code. Pipedream automatically prompts for input values in the UI and CLI.

Label	Prop	Type	Description
Google Drive	`googleDrive`	`app`	This component uses the Google Drive app.
N/A	`db`	`$.service.db`	This component uses `$.service.db` to maintain state between executions.
N/A	`http`	`$.interface.http`	This component uses `$.interface.http` to generate a unique URL when the component is first instantiated. Each request to the URL will trigger the `run()` method of the component.
Drive	`drive`	`string`	Select a value from the drop down menu.
Files	`files`	`string[]`	Select a value from the drop down menu.
Include Link	`includeLink`	`boolean`	Upload file to your File Stash and emit temporary download link to the file. Google Workspace documents will be converted to PDF. See the docs to learn more about working with files in Pipedream.
	`dir`	`dir`
Minimum Interval Per File	`perFileInterval`	`integer`	How many minutes to wait until the same file can emit another event. If set to `0`, this interval is disabled and all events will be emitted.

Trigger Authentication#

Google Drive uses OAuth authentication. When you connect your Google Drive account, Pipedream will open a popup window where you can sign into Google Drive and grant Pipedream permission to connect to your account. Pipedream securely stores and automatically refreshes the OAuth tokens so you can easily authenticate any Google Drive API.

Pipedream requests the following authorization scopes when you connect your account:

https://www.googleapis.com/auth/drive

About Google Drive#

Google Drive is a file storage and synchronization service which allows you to create and share your work online, and access your documents from anywhere.

Action#

Scrape with JS Rendering on ScrapeNinja

Description:Uses the ScrapeNinja real Chrome browser engine to scrape pages that require JS rendering. [See the documentation](https://scrapeninja.net/docs/api-reference/scrape-js/)

Version:0.0.2

Key:scrapeninja-scrape-with-js-rendering

View on GitHub

ScrapeNinja Overview#

ScrapeNinja API on Pipedream allows you to craft powerful serverless workflows for web scraping without the hassle of managing proxies or browsers. It's a tool that can extract data from websites, handling JavaScript rendering and anti-bot measures with ease. By integrating ScrapeNinja with Pipedream, you can automate data collection, collate and process the scraped data, and connect it to numerous other services for further analysis, alerting, or storage.

Action Code#

import { ConfigurationError } from "@pipedream/platform";
import {
  clearObj,
  parseError, parseObject,
} from "../../common/utils.mjs";
import scrapeninja from "../../scrapeninja.app.mjs";

export default {
  key: "scrapeninja-scrape-with-js-rendering",
  name: "Scrape with JS Rendering",
  description: "Uses the ScrapeNinja real Chrome browser engine to scrape pages that require JS rendering. [See the documentation](https://scrapeninja.net/docs/api-reference/scrape-js/)",
  version: "0.0.2",
  annotations: {
    destructiveHint: false,
    openWorldHint: true,
    readOnlyHint: false,
  },
  type: "action",
  props: {
    scrapeninja,
    url: {
      propDefinition: [
        scrapeninja,
        "url",
      ],
    },
    waitForSelector: {
      propDefinition: [
        scrapeninja,
        "waitForSelector",
      ],
      optional: true,
    },
    postWaitTime: {
      propDefinition: [
        scrapeninja,
        "postWaitTime",
      ],
      optional: true,
    },
    dumpIframe: {
      propDefinition: [
        scrapeninja,
        "dumpIframe",
      ],
      optional: true,
    },
    waitForSelectorIframe: {
      propDefinition: [
        scrapeninja,
        "waitForSelectorIframe",
      ],
      optional: true,
    },
    extractorTargetIframe: {
      propDefinition: [
        scrapeninja,
        "extractorTargetIframe",
      ],
      optional: true,
    },
    headers: {
      propDefinition: [
        scrapeninja,
        "headers",
      ],
      optional: true,
    },
    retryNum: {
      propDefinition: [
        scrapeninja,
        "retryNum",
      ],
      optional: true,
    },
    geo: {
      propDefinition: [
        scrapeninja,
        "geo",
      ],
      optional: true,
    },
    proxy: {
      propDefinition: [
        scrapeninja,
        "proxy",
      ],
      optional: true,
    },
    timeout: {
      propDefinition: [
        scrapeninja,
        "timeout",
      ],
      optional: true,
    },
    textNotExpected: {
      propDefinition: [
        scrapeninja,
        "textNotExpected",
      ],
      optional: true,
    },
    statusNotExpected: {
      propDefinition: [
        scrapeninja,
        "statusNotExpected",
      ],
      optional: true,
    },
    blockImages: {
      propDefinition: [
        scrapeninja,
        "blockImages",
      ],
      optional: true,
    },
    blockMedia: {
      propDefinition: [
        scrapeninja,
        "blockMedia",
      ],
      optional: true,
    },
    screenshot: {
      propDefinition: [
        scrapeninja,
        "screenshot",
      ],
      optional: true,
    },
    catchAjaxHeadersUrlMask: {
      propDefinition: [
        scrapeninja,
        "catchAjaxHeadersUrlMask",
      ],
      optional: true,
    },
    viewportWidth: {
      propDefinition: [
        scrapeninja,
        "viewportWidth",
      ],
      optional: true,
    },
    viewportHeight: {
      propDefinition: [
        scrapeninja,
        "viewportHeight",
      ],
      optional: true,
    },
    viewportDeviceScaleFactor: {
      propDefinition: [
        scrapeninja,
        "viewportDeviceScaleFactor",
      ],
      optional: true,
    },
    viewportHasTouch: {
      propDefinition: [
        scrapeninja,
        "viewportHasTouch",
      ],
      optional: true,
    },
    viewportIsMobile: {
      propDefinition: [
        scrapeninja,
        "viewportIsMobile",
      ],
      optional: true,
    },
    viewportIsLandscape: {
      propDefinition: [
        scrapeninja,
        "viewportIsLandscape",
      ],
      optional: true,
    },
    extractor: {
      propDefinition: [
        scrapeninja,
        "extractor",
      ],
      optional: true,
    },
  },
  async run({ $ }) {
    try {
      const viewport = clearObj({
        width: this.viewportWidth,
        height: this.viewportHeight,
        deviceScaleFactor: this.viewportDeviceScaleFactor,
        hasTouch: this.viewportHasTouch,
        isMobile: this.viewportIsMobile,
        isLandscape: this.viewportIsLandscape,
      });

      const data = clearObj({
        url: this.url,
        waitForSelector: this.waitForSelector,
        postWaitTime: this.postWaitTime,
        dumpIframe: this.dumpIframe,
        waitForSelectorIframe: this.waitForSelectorIframe,
        extractorTargetIframe: this.extractorTargetIframe,
        headers: parseObject(this.headers),
        retryNum: this.retryNum,
        geo: this.geo,
        proxy: this.proxy,
        timeout: this.timeout,
        textNotExpected: parseObject(this.textNotExpected),
        statusNotExpected: parseObject(this.statusNotExpected),
        blockImages: this.blockImages,
        blockMedia: this.blockMedia,
        screenshot: this.screenshot,
        catchAjaxHeadersUrlMask: this.catchAjaxHeadersUrlMask,
        extractor: this.extractor,
      });

      if (Object.entries(viewport).length) {
        data.viewport = viewport;
      }

      const response = await this.scrapeninja.scrapeJs({
        $,
        data,
      });

      $.export("$summary", `Successfully scraped ${this.url} with JS rendering`);
      return response;
    } catch ({ response: { data } }) {
      throw new ConfigurationError(parseError(data));
    }
  },
};

Action Configuration#

This component may be configured based on the props defined in the component code. Pipedream automatically prompts for input values in the UI.

Label	Prop	Type	Description
ScrapeNinja	`scrapeninja`	`app`	This component uses the ScrapeNinja app.
URL	`url`	`string`	The URL to scrape.
Wait For Selector	`waitForSelector`	`string`	CSS selector to wait for before considering the page loaded.
Post Wait Time	`postWaitTime`	`integer`	Wait for specified amount of seconds after page load (from 1 to 12s). Use this only if ScrapeNinja failed to wait for required page elements automatically.
Dump Iframe	`dumpIframe`	`string`	If some particular iframe needs to be dumped, specify its name HTML value in this argument. The ScrapeNinja JS renderer will wait for CSS selector to wait for iframe DOM elements to appear inside.
Wait For Selector Iframe	`waitForSelectorIframe`	`string`	If `Dump Iframe` is activated, this property allows to wait for CSS selector inside this iframe.
Extractor Target Iframe	`extractorTargetIframe`	`boolean`	If `Dump Iframe` is activated, this property allows to run JS extractor function against iframe HTML instead of running it against base body. This is only useful if `Dump Iframe` is activated.
Headers	`headers`	`string[]`	Custom headers to send with the request. By default, regular Chrome browser headers are sent to the target URL.
Retry Number	`retryNum`	`integer`	Amount of attempts.
Geo	`geo`	`string`	Geo location for basic proxy pools (you can purchase premium ScrapeNinja proxies for wider country selection and higher proxy quality). Read more about ScrapeNinja proxy setup
Proxy	`proxy`	`string`	Premium or your own proxy URL (overrides `Geo` prop). Read more about ScrapeNinja proxy setup
Timeout	`timeout`	`integer`	Timeout per attempt, in seconds. Each retry will take [timeout] number of seconds.
Text Not Expected	`textNotExpected`	`string[]`	Text which will trigger a retry from another proxy address.
Status Not Expected	`statusNotExpected`	`integer[]`	HTTP response statuses which will trigger a retry from another proxy address.
Block Images	`blockImages`	`boolean`	Block images from loading. This will speed up page loading and reduce bandwidth usage.
Block Media	`blockMedia`	`boolean`	Block (CSS, fonts) from loading. This will speed up page loading and reduce bandwidth usage.
Screenshot	`screenshot`	`boolean`	Take a screenshot of the page. Pass "false" to increase the speed of the request.
Catch Ajax Headers URL Mask	`catchAjaxHeadersUrlMask`	`string`	Useful to dump some XHR response. Pass URL mask here. For example, if you need to catch all requests to https://example.com/api/data.json, pass "api/data.json" here. In response, you will get new property `.info.catchedAjax` with the XHR response data - { url, method, headers[], body , status, responseHeaders{} }
Viewport Width	`viewportWidth`	`integer`	Width of the viewport.
Viewport Height	`viewportHeight`	`integer`	Height of the viewport.
Viewport Device Scale Factor	`viewportDeviceScaleFactor`	`integer`	Device scale factor for the viewport.
Viewport Has Touch	`viewportHasTouch`	`boolean`	Whether the viewport has touch capabilities.
Viewport Is Mobile	`viewportIsMobile`	`boolean`	Whether the viewport is mobile.
Viewport Is Landscape	`viewportIsLandscape`	`boolean`	Whether the viewport is in landscape mode.
Extractor	`extractor`	`string`	Custom JS function to extract JSON values from scraped HTML. Write&test your own extractor on https://scrapeninja.net/cheerio-sandbox/

Action Authentication#

ScrapeNinja uses API keys for authentication. When you connect your ScrapeNinja account, Pipedream securely stores the keys so you can easily authenticate to ScrapeNinja APIs in both code and no-code steps.

Using ScrapingNinja in Pipedream

Create a RapidAPI Account: Begin by signing up for a RapidAPI account.
Access Your API Key:
- Once registered, you'll be able to interact with ScrapingNinja using your RapidAPI key.
- Open the ScrapingNinja documentation on RapidAPI and locate your API key labeled X-RapidAPI-Key.
- Copy this key and paste it into the rapid_api_key field below.
Subscribe to the API: Finally, click Subscribe to Test in the RapidAPI console to subscribe to the ScrapingNinja API.

About ScrapeNinja#

Extract Web Data on Scale

More Ways to Connect ScrapeNinja + Google Drive#

Other Popular Integrations#

Scrape with JS Rendering with ScrapeNinja API on New Presentation (Instant) from Google Drive API

Google Drive + ScrapeNinja

Scrape with JS Rendering with ScrapeNinja API on Changes to Specific Files (Shared Drive) from Google Drive API

Pipedream makes it easy to connect APIs for ScrapeNinja, Google Drive and 3,000+ other apps remarkably fast.

Trusted by 1,000,000+ developers from startups to Fortune 500 companies

Developers ♥ Pipedream

Using ScrapingNinja in Pipedream

1-24of3,000+apps by most popular

1
-
24
of
3,000+
apps by most popular