Why am I Getting '[Errno 30] Read-only File System' Error While Using Pyppeteer in Python?

This topic was automatically generated from Slack. You can find the original thread here.

guys, I’m trying to use puppeteer in python (pyppeteer) but I’m getting this error: [Errno 30] Read-only file system: '/home/sbx_user1051'
What might be the problem?

this is my code:

import asyncio
from pyppeteer import launch

def handler(pd: "pipedream"):
    async def main():
        browserObj = await launch({"headless": True})
        url = await browserObj.newPage()
        await url.goto('https://scrapeme.live/shop/')
        htmlContent = await url.content()
        print(htmlContent)
        await browserObj.close()

    asyncio.get_event_loop().run_until_complete(main())

perhaps I’m not importing it correctly?

Hi , only the /tmp directory is writable within the workflow’s execution environment.

Perhaps try configuring the puppeteer instance to use that directory instead of the default /home directory.

We don’t officially support Puppeteer within Python steps yet, because it requires a Chromium instance that’s compatible with serverless environments.

But we do have Puppeteer available in Node.js code & pre-built actions:

can you show me how to use Puppeteer in Node.js code steps using the @pipedream/browsers package?

yeah I’ve tried using it with node js but it’s giving me tons of problems

Sure, here’s an example of how you can use Puppeteer in Node.js code steps using the @pipedream/browsers package: import { launch } from "@pipedream/browsers"; export default defineComponent({ async run({ steps, $ }) { const browser = await launch(); const page = await browser.newPage(); await page.goto("https://example.com"); const title = await page.title(); console.log(title); await browser.close(); return title; }, }); In this example, we’re launching a new browser instance, opening a new page, navigating to “https://example.com”, getting the title of the page, logging it to the console, and then closing the browser. The title of the page is also returned as the result of the step. Please note that the @pipedream/browsers package is a wrapper around Puppeteer that’s optimized for use in Pipedream’s serverless environment. It provides the same API as Puppeteer, so you can refer to the Puppeteer documentation for more information on how to use it.

Can you describe some of the problems?

Did you upgrade the memory to at least 2 gbs on the deployed workflows using Puppeteer?

I did. I’m trying to loop through an array of URLs, and get each one’s HTML code.
It’s very hard to do because its an async function

It sounds like your workflow retrieves this list of URLs from a database of some kind perhaps.

Have you tried splitting the querying & iteration portion of the workflow from the individual processing of each URL?

my list of URL is just a const I’ve set, for example
const websites = ['https://www.google.com', 'https://www.facebook.com'];

Got it, same principle applies.

You also could use await within a for loop to await each iteration. Or use Promse.all to await all promises.

then I’m using forEach to scrape each one

I’ll try Promise.all since await just don’t work for me

It’s really confusing in Node.js, but you can’t just make the forEach ’s callback function async to make it wait for each promise serially.

You have to use a for await(const url of websites) for that kind of one at a time processing.

But Promise.all would be better in your case anyway, it’ll launch all HTTP requests at once, making it faster.

given an array of URLs, can you show me how to map over each one and use the puppeeter export from @pipedream/browsers to retrieve the HTML content from each URL in an array of Promises passed to a Promise.all so the step execution will wait until the promises are finished before code control moves to the next step in the workflow?

Sure, here’s an example of how you can use the Puppeteer export from @pipedream/browsers to retrieve the HTML content from each URL in an array of Promises passed to a Promise.all. This will ensure that the step execution waits until all the promises are finished before moving to the next step in the workflow: import { launch } from "@pipedream/browsers"; export default defineComponent({ async run({ steps, $ }) { const websites = ['https://www.google.com', 'https://www.facebook.com']; const browser = await launch(); const promises = websites.map(async (url) => { const page = await browser.newPage(); await page.goto(url); return await page.content(); }); const htmlContents = await Promise.all(promises); await browser.close(); return htmlContents; }, }); In this example, we’re launching a new browser instance, mapping over each URL in the websites array, opening a new page for each URL, navigating to the URL, getting the HTML content of the page, and adding the Promise to the promises array. We then use Promise.all to wait for all the Promises in the promises array to resolve, storing the resulting array of HTML contents in htmlContents. Finally, we close the browser and return htmlContents as the result of the step.

That should do the trick