How to Modify a Python Code for Running in a Component, which Involves Selenium Webdriver, Manually Solving CAPTCHA, and Scraping Reviews from a Website?

This topic was automatically generated from Slack. You can find the original thread here.

the following code i need to run in a python component , modify it :

from selenium import webdriver
from selenium.webdriver.common.by import By

Set up the WebDriver (assuming Chrome, but can be adjusted for other browsers)

driver = webdriver.Chrome()

Navigate to the page

driver.get(“https://www.g2.com/products/smartsheet/reviews”)

Wait for the user to solve the CAPTCHA manually

input(“Please solve the CAPTCHA and press Enter to continue…”)

Scrape the content after CAPTCHA is solved

reviews = driver.find_elements(By.CLASS_NAME, “formatted-text”) # Adjust the selector as needed
for review in reviews:
print(review.text)

Close the WebDriver

driver.quit()

To run the provided code in a Pipedream Python component, you need to modify it to fit within the Pipedream environment. Here’s the modified code: ```
from selenium import webdriver from selenium.webdriver.common.by import By def handler(pd: “pipedream”): # Set up the WebDriver (assuming Chrome, but can be adjusted for other browsers) driver = webdriver.Chrome() # Navigate to the page driver.get(“https://www.g2.com/products/smartsheet/reviews”) # Wait for the user to solve the CAPTCHA manually input(“Please solve the CAPTCHA and press Enter to continue…”) # Scrape the content after CAPTCHA is solved reviews = driver.find_elements(By.CLASS_NAME, “formatted-text”) # Adjust the selector as needed for review in reviews: print(review.text) # Close the WebDriver driver.quit()

Cache folder (/home/sbx_user1051/.cache/selenium) cannot be created: Read-only file system (os error 30)

It seems like you’re encountering an issue with Selenium trying to create a cache folder in a read-only file system. In Pipedream, the file system is mostly read-only, except for the /tmp directory, which is writable. To resolve this issue, you can configure Selenium to use the /tmp directory for its cache. Here is an example of how you can set this up in a Pipedream component: import { defineComponent } from "@pipedream/types"; import { Builder, By, Key, until } from "selenium-webdriver"; import chrome from "selenium-webdriver/chrome"; export default defineComponent({ async run({ steps, $ }) { const options = new chrome.Options(); options.setUserPreferences({ "profile.default_content_settings.popups": 0, "download.default_directory": "/tmp", "download.prompt_for_download": false, "safebrowsing.enabled": true, }); // Set the cache directory to /tmp options.addArguments(`--user-data-dir=/tmp`); let driver = await new Builder() .forBrowser("chrome") .setChromeOptions(options) .build(); try { await driver.get("https://example.com"); // Perform your Selenium operations here } finally { await driver.quit(); } }, }); In this example, the --user-data-dir=/tmp argument is added to the Chrome options to ensure that the cache and user data are stored in the writable /tmp directory. If you need further assistance, please let me know!