How to Grab Automatically Downloaded Files in Puppeteer?

This topic was automatically generated from Slack. You can find the original thread here.

How in puppeteer grab the file that is automatically downloaded?

Puppeteer is a powerful tool for automating browser actions, but it doesn’t directly support grabbing files that are automatically downloaded. However, you can use Puppeteer to navigate to the page and trigger the download, and then use other tools or methods to access the downloaded file. If you’re using Pipedream, you can use the Puppeteer app’s actions like “Get PDF”, “Get HTML”, “Get Page Title”, or “Screenshot a Page”. Here’s how you can use these actions: 1. Search for the Puppeteer app from the step menu 2. Select the action you want to use (e.g., “Get PDF”) 3. Connect your Puppeteer account 4. Select a URL or pass a string from a previous step. 5. At the bottom of the step, click Test to run the action and test the configuration. Please note that these actions do not directly handle file downloads. They are used to interact with web pages and retrieve information. If you need to handle file downloads, you might need to use a custom code step with Node.js and Puppeteer. However, please note that handling file downloads in Puppeteer can be complex and might require advanced coding skills. For more detailed help, you might want to visit https://pipedream.com/support for more support options.

And BARD answer: Puppeteer provides two main approaches to handle automatically downloaded files:

  1. Intercept and Save the Download:
  2. This method involves intercepting the network request that triggers the download and capturing the response body. Here’s an example of how to intercept and save a downloaded file:
  3. JavaScript
  4. const puppeteer = require(‘puppeteer’);
  5. async function downloadFile(url, downloadPath) {
  6. const browser = await puppeteer.launch();
    
  7. const page = await browser.newPage();
    
  8. page.on('request', request => {
    
  9.     if (request.resourceType() === 'resource' && request.url().startsWith(url)) {
    
  10.         request.respond({
    
  11.             status: 200,
    
  12.             contentType: request.headers()['content-type'],
    
  13.             body: await request.response().buffer()
    
  14.         });
    
  15.     }
    
  16. });
    
  17. await page.goto('https://example.com/download-page');
    
  18. await page.evaluate(() => document.querySelector('#download-link').click());
    
  19. await browser.close();
    
  20. await fs.promises.writeFile(downloadPath, buffer);
    
  21. console.log('File downloaded successfully!');
    
  22. }
  23. Use code with caution. Learn more
  24. This code snippet intercepts the download request, captures the response body, and saves it to the specified downloadPath.
  25. Monitor Downloads and Save:
  26. Alternatively, you can monitor the download directory and save the downloaded file once it appears. This method is useful when you don’t know the filename or URL of the downloaded file beforehand. Here’s an example of how to monitor downloads and save a file:
  27. JavaScript
  28. const puppeteer = require(‘puppeteer’);
  29. const fs = require(‘fs’);
  30. const path = require(‘path’);
  31. async function monitorDownloads(downloadDirectory) {
  32. const browser = await puppeteer.launch();
    
  33. const page = await browser.newPage();
    
  34. const downloadDir = path.join(downloadDirectory, 'downloads');
    
  35. const startingFiles = await fs.promises.readdir(downloadDir);
    
  36. await page.goto('https://example.com/download-page');
    
  37. await page.evaluate(() => document.querySelector('#download-link').click());
    
  38. // Wait for the download to complete
    
  39. await new Promise((resolve, reject) => {
    
  40.     const intervalId = setInterval(async () => {
    
  41.         const files = await fs.promises.readdir(downloadDir);
    
  42.         const newFiles = files.filter(file => !startingFiles.includes(file));
    
  43.         if (newFiles.length > 0) {
    
  44.             clearInterval(intervalId);
    
  45.             resolve(newFiles[0]);
    
  46.         }
    
  47.     }, 100);
    
  48. });
    
  49. const downloadedFile = path.join(downloadDir, downloadedFileName);
    
  50. console.log('File downloaded:', downloadedFile);
    
  51. await browser.close();
    
  52. }
  53. Use code with caution. Learn more
  54. This code snippet monitors the download directory, identifies the newly downloaded file, and saves it with the appropriate filename.

Which bot is telling the truth? :slightly_smiling_face:

I guess thats for me to find out :slightly_smiling_face: thanks!

Theres this thing that keeps showing up everywhere:

await page._client.send('Page.setDownloadBehavior', {
      behavior: 'allow',
      downloadPath: downloadDir
    });

but I cant use it I keep getting:
page._client.send is not a function

Try ask bot Why code: await page._client.send(‘Page.setDownloadBehavior’, {
behavior: ‘allow’,
downloadPath: downloadDir
}); make error: page._client.send is not a function

The error you’re encountering is because page._client is not a part of the public Puppeteer API. It’s an internal property and it’s not recommended to use it in your code because it can change without notice between versions, leading to potential issues like the one you’re experiencing. In Puppeteer, to handle file downloads, you can use the page.on('response') event to intercept the network response and save the file. However, please note that this method might not work for all types of downloads. If you’re using Pipedream’s Puppeteer integration, you can use the following example to download a file: ```
import { puppeteer } from ‘@pipedream/browsers’; import fs from ‘fs’; import path from ‘path’; export default defineComponent({ async run({ $ }) { const browser = await puppeteer.browser(); const page = await browser.newPage(); // Set up response listener page.on(‘response’, async (response) => { const url = response.url(); if (url.endsWith(‘.pdf’)) { // adjust this condition based on your needs const buffer = await response.buffer(); fs.writeFileSync(path.join(‘/tmp’, ‘downloaded.pdf’), buffer); } }); // Navigate to page and trigger download await page.goto(‘https://example.com/download-page’); await page.click(‘#download-link’); // adjust this selector based on your needs await browser.close(); }, });

I have no idea about programming, but when I asked AI BARD it wrote back: "The error “page._client.send is not a function” occurs because the _client property of the Page object is not publicly supported, and its behavior may change in future versions of Puppeteer. The _client property is used to access the Chromium DevTools Protocol (CDP) session associated with the page and provides a way to send CDP commands to the browser. However, this is an internal detail of Puppeteer’s implementation and is not intended for direct use by developers.

Instead of using page._client.send, use Puppeteer’s publicly supported download API. The page.download() method allows you to specify a callback function that will be called when the file is downloaded. The callback function will receive a Download object that contains information about the downloaded file, such as its URL, file name and MIME type. This information can then be used to save the file to the desired location.

Here’s an example of using the Page.download() method to download a file:

JavaScript
page.on(‘download’, (download) => {
const downloadPath = ${downloadDir}/${download.filename};
download.save(downloadPath);
});

await page.goto(‘https://example.com/file.pdf’);
Use your code with caution. Learn more
This code will download the file.pdf from the URL https://example.com/file.pdf and save it to the downloadDir directory. The downloadPath variable is used to construct the full path to the downloaded file.

Using the publicly supported Puppeteer API is more reliable and future-proof than using the _client property directly. The Page.download() method works in all versions of Puppeteer and will not be affected by changes to Puppeteer’s internal implementation."

While lurking for answers from people who know programming, you can ask various AI chat rooms. GPT4, Bard, Bing, CoPilot, DeepSeek Coder… sometimes they can help.

Another alternative is that once logged in, you can grab the session cookie from Puppeteer, and then just use curl (or something similar) to download the file.

this was a tricky one, I ended up figuring out the the actual form submission triggered the download so I added this and it worked:

const result = await page.evaluate(async () => {
const form = document.querySelector(‘form#loginForm’);
const data = new FormData(form);
return fetch(form.action, {
method: ‘POST’,
credentials: ‘include’,
body: data,
})
.then(response => response.text());
});

Hi,
I have the same issue (I think)
I want to login using username and password here:
https://my.optimus.org.il/index.php?rp=/login

After login in I want to go here: https://my.optimus.org.il/dl.php?type=d&id=1

When going here after login on a computer I get the save file as prompt.

I want to use pipedream to do it for me and to pass the file to another module in pipedream - lets say send email.

I tried so many things - but nothing works.
What am I missing?