Is it Possible to Use Selenium in Pipedream?

user-1 · October 30, 2023, 8:02pm

This topic was automatically generated from Slack. You can find the original thread here.

Hey - I’m trying to (see if it is possible to) use selenium inside pipedream. There are a few threads on this in the community already, but they don’t seem very conclusive - Can Selenium be Used with Python in PipeDream? - #2 by user-2, Has Pipedream with Selenium and a Headless Browser Been Used to Webscrape and Login to a Website?. I’ve been guessing at routes forward - any pointers, or an explanation why this isn’t possible, are appreciated.

The main issue seems to be that pipedream doesn’t support having a pre-installed webdriver that selenium automates, as is normal practise. I’ve tried installing the Firefox webdriver that comes included with selenium, but run into this error:

'Read-only file system (os error 30)

This is the test I tried:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from [selenium.webdriver.common.by](http://selenium.webdriver.common.by) import By
from selenium.webdriver.firefox.options import Options

def handler(pd: "pipedream"):
  # Reference data from previous steps
  print(pd.steps["trigger"]["context"]["id"])
  # Return data for use in future steps

  head_options = Options()
  head_options.add_argument('-headless')  #uncomment this line to run in background

  driver = webdriver.Firefox(options=head_options)
  driver.get("http://www.python.org")
  assert "Python" in driver.title
  elem = driver.find_element(By.NAME, "q")
  elem.clear()
  elem.send_keys("pycon")
  elem.send_keys(Keys.RETURN)
  assert "No results found." not in driver.page_source
  driver.close()

  return {"foo": {"test": True}}

I also tried using the chromeless PiPy package (chromeless · PyPI):

from chromeless import Chromeless
import os

os.environ['AWS_DEFAULT_REGION'] = 'eu-west-2'

# Define Selenium method
def get_title(self, url):
    self.get(url)
    return self.title

# Attach the method and call it

def handler(pd: "pipedream"):
  # Reference data from previous steps
  print(pd.steps["trigger"]["context"]["id"])
  # Return data for use in future steps
  chrome = Chromeless()
  chrome.attach(get_title)
  print(chrome.get_title("https://google.com")) # Returns Google

  return {"foo": {"test": True}}

I figured there might be more success with this since pipedream runs on AWS lambda, but no - it gets authentication errors. I could potentially use pipedream to invoke a lambda function and use chromeless there, but that seems to be getting convoluted.

Is this doable? Can pipedream run selenium?

user-1 · October 30, 2023, 8:02pm

can you help?

user-1 · October 30, 2023, 8:02pm

I understand that you’re trying to use Selenium within Pipedream, but you’re facing some challenges. Unfortunately, running Selenium with a webdriver directly in Pipedream is not supported due to the limitations of the environment. However, you can consider an alternative approach by using a headless browser service like Puppeteer or Playwright. You can create a separate Lambda function that runs your Selenium script using one of these services, and then invoke that Lambda function from your Pipedream workflow using the AWS SDK. This way, you’ll be able to run your Selenium script in a separate environment that supports webdrivers, while still integrating it with your Pipedream workflow. Let me know if you need any help with setting up the Lambda function or invoking it from Pipedream.

user-1 · October 30, 2023, 8:02pm

we actually use selenium for some scraping, but we recently launched support for Puppeteer and Playwright — here’s our docs!

user-1 · October 30, 2023, 8:02pm

thanks! Not sure how I missed this. Unfortunately I’ve already got quite a lot of selenium code and it would be annoying to convert it all to Playwright. Do you think there’s a way to leverage that webdriver support with Selenium? If not I’ll follow the Lambda route

user-1 · October 30, 2023, 8:02pm

I gotcha!
We use Chrome with Selenium, and this is our config:

    chrome_options = webdriver.ChromeOptions()
    chrome_options.set_capability('browserless:token', api_key)
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--headless")
    driver = webdriver.Remote(
        command_executor='https://chrome.browserless.io/webdriver',
        options=chrome_options
    )

Try a similar setup, and let me know if it works

user-1 · October 30, 2023, 8:02pm

Wow, perfect. Thanks.