Is it Possible to Run Custom Sources from a VPC for Database Polling and Event Emission?

This topic was automatically generated from Slack. You can find the original thread here.

Can you run custom sources from a VPC? My use case is to poll the database and emit rows as events.

Based on the information provided in the Pipedream documentation, event sources in Pipedream can be used to run any Node code on a schedule, which includes the ability to poll a service or API for data and emit that data as events. These events can then trigger Pipedream workflows and can be retrieved using Pipedream’s REST API or SSE interface. However, the documentation does not specifically address running custom sources from a Virtual Private Cloud (VPC). Typically, running code from a VPC involves considerations around network access and security, which are not detailed in the provided documentation. For your use case of polling a database and emitting rows as events, you would be able to create a custom event source in Pipedream that polls your database at scheduled intervals. You would write Node.js code that connects to your database, queries for new or updated rows, and emits these rows as events in Pipedream. This setup would leverage the capabilities of event sources as described in the Pipedream documentation. If your database is hosted within a VPC and you need to ensure that the polling happens from within the same VPC for security or network access reasons, this specific requirement isn’t covered in the provided documentation. You might need to consider network configurations that allow secure access to your database from Pipedream’s infrastructure, such as setting up a VPN, using VPC peering, or configuring your database’s firewall rules to allow connections from Pipedream’s IP addresses. For more detailed guidance on setting up a custom event source to poll your database and emit events, and to discuss the specifics of running code from a VPC, I recommend reaching out to Pipedream’s support for more personalized assistance. You can visit Support - Pipedream for more support options.

Hi - sorry at this time it’s limited to workflows.

Mind if I ask about the use case? Do you need to set up a database connection source?

Ok. Ya I need to poll database which is whitelisted for IP address.

What do you recommend is the best way to architect this? My use case involves polling the database and updating records in Hubspot. The initial poll will have 30K+ rows.

We’re actually working on a feature that is separate from VPCs, which might enable this use case — can you confirm what kind of database you need to connect to? (MySQL, PostgreSQL, Snowflake, etc)

Regarding the credit usage, I would recommend batching your polls using OFFSET to define a last record ID to make sure you pick up where you left off.

A workflow has a hard timeout limit of 12.5 minutes, so you can then be greedy on the number of records you retrieve for processing, then use a Data Store to track the last successfully processed record ID for the next batch.