This topic was automatically generated from Slack. You can find the original thread here.
After following a “Connecting and inserting data into a Supabase Table” from Pipedream I was able to get a basic expression to work but am stuck on how to be more efficient with my code.
The goal is to get “title”, “description” and “image_url” from webpage.
My Steps:
Step 1: Trigger on adding a URL to Google Spreadsheet
Step 2: custom_request (HTTP request)
–get / {{steps.trigger.event.newRow[2]}}
– returns html page
Step 3: insert_row (Postgres)
– this works but is using a match but probably not the best option (inefficient, multiple title tags, etc.)
title / {{${steps.custom_request.$return_value}.match(/(.**?)</title>/)[1]}}
Also tried using a code step to parse the HTML but received an error.
– tried this with import instead of require and “steps” in place of "event’ **const** cheerio = require('cheerio'); **const** html = event.custom_request.body; **const** $ = cheerio.load(html); **const** title = $('title').text(); **this**.title = title;
Got this error for all of the different varieties I tried:
*ReferenceError steps is not defined
After following a “Connecting and inserting data into a Supabase Table” from Pipedream I was able to get a basic expression to work but am stuck on how to be more efficient with my code.
The goal is to get “title”, “description” and “image_url” from webpage.
My Steps:
Step 1: Trigger on adding a URL to Google Spreadsheet
Step 2: custom_request (HTTP request)
–get / {{steps.trigger.event.newRow[2]}}
– returns html page
Step 3: insert_row (Postgres)
title / {{${steps.custom_request.$return_value}.match(/(.*?)</title>/)[1]}}
To make your code more efficient and extract the “title”, “description”, and “image_url” from the webpage, you can use a custom code step with a package like cheerio to parse the HTML and extract the required data. Here’s an example of how you can modify your workflow: Step 1: Trigger on adding a URL to Google Spreadsheet (same as before) Step 2: custom_request (HTTP request) --get / {{steps.trigger.event.newRow[2]}} – returns html page Step 3: Add a new code step to parse the HTML and extract the required data: import { axios } from "@pipedream/platform"; import cheerio from "cheerio"; const html = steps.custom_request.$return_value; const $ = cheerio.load(html); const title = $("title").text(); const description = $('meta[name="description"]').attr("content"); const image_url = $('meta[property="og:image"]').attr("content"); return { title, description, image_url, }; Step 4: insert_row (Postgres) - title / {{steps.parse_html.title}} - description / {{steps.parse_html.description}} - image_url / {{steps.parse_html.image_url}} This should make your code more efficient and easier to maintain. Make sure to replace steps.parse_html with the actual name of the code step you added in Step 3.
After trying this code I am getting an error on Step3.
I also ran this several times using ChatGPT4 to check the errors but nothing I tried resolved the error:
I simplified things for testing to only get the “title”
here is my revised script: **import** cheerio **from** "cheerio";
// Get HTML from input object **const** html = input.custom_request.$return_value;
// Load HTML into Cheerio **const** $ = cheerio.load(html);
// Extract title **const** title = $('title').text();
// Assign to exports **this**.title = title;
here is the error: ReferenceError input is not defined DETAILS
ReferenceError: input is not defined
at file:///tmp/__pdg__/dist/code/39e80a526133c5e6317deb8d779f6bdfef7d9f187f72b5ef891c00342fc06ba3/index.mjs:4:14
at ModuleJob.run (internal/modules/esm/module_job.js:183:25)
at async Loader.import (internal/modules/esm/loader.js:178:24)
at async Object.loadESM (internal/process/esm_loader.js:68:5)
at async handleMainPromise (internal/modules/run_main.js:59:12)
For the code that was first suggested I received this error: Code Error 'return' outside of function. (9:0) DETAILS
SyntaxError: 'return' outside of function. (9:0)
at _raise (/var/task/node_modules/@babel/parser/src/parser/error.js:147:45)
at raiseWithData (/var/task/node_modules/@babel/parser/src/parser/error.js:142:17)
at raise (/var/task/node_modules/@babel/parser/src/parser/error.js:91:17)
at parseReturnStatement (/var/task/node_modules/@babel/parser/src/parser/statement.js:672:12)
at parseStatementContent (/var/task/node_modules/@babel/parser/src/parser/statement.js:288:21)
at parseStatement (/var/task/node_modules/@babel/parser/src/parser/statement.js:242:17)
at parseStatement (/var/task/node_modules/@babel/parser/src/plugins/flow/index.js:1842:26)
at parseBlockOrModuleBlockBody (/var/task/node_modules/@babel/parser/src/parser/statement.js:980:25)
at parseBlockBody (/var/task/node_modules/@babel/parser/src/parser/statement.js:956:10)
at parseProgram (/var/task/node_modules/@babel/parser/src/parser/statement.js:129:10)
I apologize for the confusion earlier. Here’s the corrected code for Step 3 as a valid Pipedream component: import cheerio from "cheerio"; export default defineComponent({ props: { html: { type: "string", label: "HTML content", }, }, async run({ $ }) { const $ = cheerio.load(this.html); const title = $("title").text(); const description = $('meta[name="description"]').attr("content"); const image_url = $('meta[property="og:image"]').attr("content"); return { title, description, image_url, }; }, }); Make sure to replace this.html with the actual HTML content from the previous step (Step 2) in your workflow. Then, you can use the output from this step in Step 4 to insert the data into your Postgres table.
After multiple tries, still not getting this to work. Not sure I understand these instructions:
Make sure to replace this.html with the actual HTML content from the previous step (Step 2) in your workflow.
in Step 2 here are my settings/values:
custom_request
Get HTTP
{{steps.trigger.event.newRow[2]}}
in the “Results” Export tab:
full html output with these paths:
steps.custom_request
steps.custom_request.$return_value
and trying to return this from the html code:
How to Restore a Deck on Your Own | Family Handyman
Try pasting the code from this step into a Node.js code step. Then press the button above the code step labeled Refresh Fields. You’ll see an input field appear above the step.
Pass {{steps.custom_request.$return_value}} to the HTML input field. The code will reference that value. Press Test on that code step and you should see the title returned
I did have this in the HTML input field
and got this error: Code Error
Identifier ‘$’ has already been declared (11:10)
I then run this revised the code:
import cheerio from "cheerio";
import { defineComponent } from "@pipedreamhq/pd-backend";
export default defineComponent({
props: {
html: {
type: "string",
label: "HTML content",
},
},
async run({ html }) {
const $ = cheerio.load(html);
const title = $("title").text();
return {
title,
};
},
});
and get this error
Code Error
Identifier ‘defineComponent’ has already been declared. (23:10)
After a few revisions I am getting closer with this code: