Why am I Getting '[Errno 30] Read-only File System' Error While Using Pyppeteer in Python?

user-1 · November 29, 2023, 4:38pm

it says “The requested module ‘@pipedream/browsers’ does not provide an export named ‘launch’”

user-1 · November 29, 2023, 4:38pm

can you correct the import from the code sample above? @pipedream/browsers exports a puppeteer module, not a launch module

user-1 · November 29, 2023, 4:38pm

I apologize for the confusion earlier. You’re correct, the @pipedream/browsers package exports a puppeteer object, not a launch function. Here’s the corrected code: import puppeteer from "@pipedream/browsers"; export default defineComponent({ async run({ steps, $ }) { const websites = ['https://www.google.com', 'https://www.facebook.com']; const browser = await puppeteer.launch(); const promises = websites.map(async (url) => { const page = await browser.newPage(); await page.goto(url); return await page.content(); }); const htmlContents = await Promise.all(promises); await browser.close(); return htmlContents; }, }); In this corrected version, we’re importing puppeteer from @pipedream/browsers, and then using puppeteer.launch() to launch a new browser instance. The rest of the code remains the same.

user-1 · November 29, 2023, 4:38pm

That’s still not quite correct, it’s a named export:

import { puppeteer } from '@pipedream/browsers'

user-1 · November 29, 2023, 4:38pm

looks like it’s working, but when I actually add all my URLs to that array (there’s 17 of them), it returns this error: Navigation timeout of 30000 ms exceeded

user-1 · November 29, 2023, 4:38pm

Can you share your component code? Are you waiting for any specific selectors?

user-1 · November 29, 2023, 4:38pm

**import** { puppeteer } **from** '@pipedream/browsers';

export **default** defineComponent({
**async** run({ steps, $ }) {
**var** websites = ['https://chutzlaaretz.com','https://walcottradio.com','https://userway.org','https://eyeclick.com','https://rivkazaide.co.il','https://claimjusticerefund.com','https://totalmerchantresources.com','https://verbit.ai','https://verbit.co','https://go.verbit.ai','https://vitac.com','https://diplomatline.co.il','https://kmslh.com','https://aman.co.il','https://amandigital.co.il','https://rasner.co.il','https://dataloop.ai'];
**const** browser = **await** puppeteer.launch();

`**const** promises = websites.map(**async** (url) => {`
  `**const** page = **await** browser.newPage();`
  `**await** page.goto(url);`
  **`return`** `**await** page.content();`
`});`

`**const** htmlContents = **await** Promise.all(promises);`
`**await** browser.close();`

`**return** htmlContents;`

},
});

user-1 · November 29, 2023, 4:38pm

I wonder if one website in particular is slow to load or is not responding timely. Perhaps try adding a console.log(url) within the loop to identify which once fails

user-1 · November 29, 2023, 4:38pm

Or wrapping the scraping code with a try…catch to catch the exception and log which website is the culprit

user-1 · November 29, 2023, 4:38pm

But again, splitting the workflow into two helps diagnose these issues too, since an event is dedicated to one record

user-1 · November 29, 2023, 4:38pm

check it out

user-1 · November 29, 2023, 4:38pm

I added {timeout: 0} to the goto function

user-1 · November 29, 2023, 4:38pm

**await** page.goto(url, {timeout: 0});

user-1 · November 29, 2023, 4:38pm

now it works

user-1 · November 29, 2023, 4:38pm

Nice find! I’m not entirely sure what that option does exactly, I’m not seeing a clear definition on the docs for that method:

github.com

puppeteer/puppeteer/blob/puppeteer-v19.8.0/docs/api/puppeteer.frame.goto.md

---
sidebar_label: Frame.goto
---

# Frame.goto() method

Navigates a frame to the given url.

#### Signature:

```typescript
class Frame {
  goto(
    url: string,
    options?: {
      referer?: string;
      referrerPolicy?: string;
      timeout?: number;
      waitUntil?: PuppeteerLifeCycleEvent | PuppeteerLifeCycleEvent[];
    }

This file has been truncated. show original

user-1 · November 29, 2023, 4:38pm

user-1 · November 29, 2023, 4:38pm

Thanks for sharing!

By default, the maximum navigation timeout is 30 seconds. If for some reason, your target url requires more time to load (which seems unlikely), you can specify a timeout: 0 option.

So one of the URLs is taking longer than 30 seconds to load. Which probably means it’s down.