Extract Web Data on Scale
Uses the ScrapeNinja real Chrome browser engine to scrape pages that require JS rendering. See the documentation
Write Python and use any of the 350k+ PyPi packages available. Refer to the Pipedream Python docs to learn more.
Use high-performance web scraping endpoint with Chrome browser TLS fingerprint, but without JavaScript execution and real browser overhead. See the documentation
ScrapeNinja API on Pipedream allows you to craft powerful serverless workflows for web scraping without the hassle of managing proxies or browsers. It's a tool that can extract data from websites, handling JavaScript rendering and anti-bot measures with ease. By integrating ScrapeNinja with Pipedream, you can automate data collection, collate and process the scraped data, and connect it to numerous other services for further analysis, alerting, or storage.
import { axios } from '@pipedream/platform';
export default defineComponent({
props: {
scrapeninja: {
type: "app",
app: "scrapeninja",
}
},
async run({steps, $}) {
return await axios($, {
method: 'POST',
url: 'https://scrapeninja.p.rapidapi.com/scrape',
headers: {
'content-type': 'application/json',
'X-RapidAPI-Key': this.scrapeninja.$auth.rapid_api_key,
'X-RapidAPI-Host': 'scrapeninja.p.rapidapi.com'
},
data: {
url:"https://news.ycombinator.com/"
}
})
},
})
Develop, run and deploy your Python code in Pipedream workflows. Integrate seamlessly between no-code steps, with connected accounts, or integrate Data Stores and manipulate files within a workflow
This includes installing PyPI packages, within your code without having to manage a requirements.txt
file or running pip
.
Below is an example of using Python to access data from the trigger of the workflow, and sharing it with subsequent workflow steps:
def handler(pd: "pipedream"):
# Reference data from previous steps
print(pd.steps["trigger"]["context"]["id"])
# Return data for use in future steps
return {"foo": {"test":True}}