Scraping Discourse with a custom Pipedream Source
Follow along as we build a custom Pipedream Source that scrapes an entire database of Discourse topics via it's REST API and a Pipedream Source component.
Open a Gitpod Development Environment
First, open a Gitpod Development Environment so you can follow along in a dedicated window.
Just open this link to start: https://gitpod.io/#https://pipedream.com/PipedreamHQ/pipedream
After the environment has initialized, you'll be prompted to enter in your Pipedream API keys, which you can find here.
Initialize a new source component
First follow these commands to create a new directory called personal
for your app to live in:
cd components
mkdir personal
Next, let's create our new discourse-scraper
source scaffolding so we have a file to start from:
pd init source discourse-scraper
Finally, we can open this brand new file in our code editor:
code discourse-scraper/discourse-scraper.mjs
Steal, I mean use my code
With the code scaffolding open, you can copy and paste my code from the video:
import { axios } from '@pipedream/platform';
export default {
name: "Discourse Scraper",
version: "0.0.1",
key: "discourse-scraper",
description: "Emit new events on each...",
props: {
discourse: {
type: "app",
app: "discourse"
},
db: "$.service.db",
timer: {
type: "$.interface.timer",
default: {
cron: "0 0 * * *", // Run job once a day
},
},
},
dedupe: 'unique',
type: "source",
methods: {},
async run(event) {
const page = this.db.get('page') ?? 0;
const data = await axios(this, {
url: `https://${this.discourse.$auth.domain}/c/help/5.json?page=${page}`,
headers: {
"Api-Username": `${this.discourse.$auth.api_username}`,
"Api-Key": `${this.discourse.$auth.api_key}`,
},
});
console.log(data);
for(const topic of data.topic_list.topics) {
console.log(`Emitting a single topic `, topic);
this.$emit(
{ topic },
{
id: topic.id,
summary: topic.title,
ts: Date.now(),
}
);
}
this.db.set('page', page + 1);
},
};
Deploy this code to your Pipedream account
Back in your terminal on Gitpod, enter in this command to deploy this component to your account:
pd dev discourse-scraper/discourse-scraper.mjs
The pd dev
command will allow you make changes to your files and they will update the source in your Pipedream account in real time.
Finally, return to your Pipedream accounts sources here, open up the new Discourse Scraper and click RUN NOW to trigger it manually.
Learn more and get connected!
🔨 Start building at https://pipedream.com
📣 Read our blog https://pipedream.com/blog
💬 Join our community https://pipedream.com/community