How to Retain Google Docs Formatting When Transferring to WordPress?

user-1 · October 18, 2023, 4:15am

I am using the following code:

import { axios } from "@pipedream/platform";
import TurndownService from "turndown";

export default defineComponent({
  props: {
    google: {
      type: "app",
      app: "google",
    },
    documentId: {
      type: "string",
      label: "Document ID",
    },
  },
  async run({ steps, $ }) {
    // Fetch the Google Doc content as HTML
    const response = await axios(this, {
      url: `https://docs.googleapis.com/v1/documents/${this.documentId}/export`,
      method: "GET",
      responseType: "text",
      headers: {
        Authorization: `Bearer ${this.google.$auth.oauth_access_token}`,
        Accept: "application/vnd.google-apps.document+html",
      },
    });

    const htmlContent = response.data;

    // Extract content between markers
    const startMarker = "==== WORDPRESS START ===";
    const endMarker = "==== WORDPRESS END ===";
    const startIndex = htmlContent.indexOf(startMarker) + startMarker.length;
    const endIndex = htmlContent.indexOf(endMarker);
    const contentBetweenMarkers = htmlContent.slice(startIndex, endIndex).trim();

    // Convert the extracted content to Markdown using TurndownService
    const turndownService = new TurndownService();
    const markdown = turndownService.turndown(contentBetweenMarkers);

    return markdown;
  },
});

And I am getting the following error:

Error - Request failed with status code 404
"!DOCTYPE html\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n **{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}** > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That's an error.</ins>\n <p>The requested URL <code>/v1/documents/1x83iPJ0bjNfVYmRsue-UDqjx_YQFDH_7cDfPoFmoihY/export</code> was not found on this server. <ins>That's all we know.</ins>\n"

user-1 · October 18, 2023, 4:15am

It looks like there’s an issue with the URL in your axios request. You have wrapped the URL with < and > characters, which are causing the error. Please remove those characters and try again: Replace this line: url: `https://docs.googleapis.com/v1/documents/${this.documentId}/export`, with: ```
url: https://docs.googleapis.com/v1/documents/${this.documentId}/export,

user-1 · October 18, 2023, 4:15am

Let’s start over because I may be confusing you.

My workflow is triggered by HTTP. Then it pulls the Google Doc - which has already been marked up (bold, italics, header 1, header 2). The area that I want to pull is marked by:

"==== WORDPRESS START ===";
"==== WORDPRESS END ===";

I just want to take what’s in between there and I want to keep the formatting.

Originally, it was suggested to use Markdown, but I’m not sure if that is working out.

Th

user-1 · October 18, 2023, 4:15am

I understand your requirement now. You want to extract the content between “==== WORDPRESS START ===” and “==== WORDPRESS END ===” from a Google Doc while preserving the formatting. To achieve this, you can use the Google Docs API to read the content and then process it. Here’s a sample Pipedream component that demonstrates how to do this: import { axios } from "@pipedream/platform"; export default defineComponent({ props: { google: { type: "app", app: "google", }, documentId: { type: "string", label: "Google Document ID", }, }, async run({ steps, $ }) { const response = await axios($, { method: "GET", url: `https://docs.googleapis.com/v1/documents/${this.documentId}`, headers: { Authorization: `Bearer ${this.google.$auth.oauth_access_token}`, }, }); const startIndex = response.content.indexOf("==== WORDPRESS START ===") + "==== WORDPRESS START ===".length; const endIndex = response.content.indexOf("==== WORDPRESS END ==="); const extractedContent = response.content.slice(startIndex, endIndex); return extractedContent; }, }); This component uses the Google Docs API to fetch the document content and then extracts the content between the specified markers. Note that this will return the content as plain text, and you might need to process it further to preserve the formatting as per your requirements. Please replace this.documentId with the actual Google Document ID you want to process.

user-1 · October 18, 2023, 4:15am

I received the following error.

TypeError
Cannot read properties of undefined (reading 'indexOf')

user-1 · October 18, 2023, 4:15am

I apologize for the confusion earlier. It seems there was an issue with my previous response. Let’s try a different approach to extract the content between “==== WORDPRESS START ===” and “==== WORDPRESS END ===” from the Google Doc. Please follow these steps: 1. Use the Google Docs API to fetch the document content as HTML. You can refer to the export guide for more information. Make sure to set the mimeType to 'text/html'. 2. Once you have the HTML content, you can use a library like cheerio to parse the HTML and extract the content between the markers. Here’s an example Pipedream component that demonstrates this approach: import { axios } from "@pipedream/platform"; import cheerio from "cheerio"; export default defineComponent({ props: { google: { type: "app", app: "google", }, documentId: { type: "string", label: "Google Document ID", }, }, async run({ steps, $ }) { const response = await axios($, { method: "GET", url: `https://docs.googleapis.com/v1/documents/${this.documentId}/export`, headers: { Authorization: `Bearer ${this.google.$auth.oauth_access_token}`, }, params: { mimeType: "text/html", }, }); const htmlContent = response; const $ = cheerio.load(htmlContent); const startIndex = $("body").html().indexOf("==== WORDPRESS START ===") + "==== WORDPRESS START ===".length; const endIndex = $("body").html().indexOf("==== WORDPRESS END ==="); const extractedContent = $("body").html().slice(startIndex, endIndex); return extractedContent; }, }); Please replace this.documentId with the actual Google Document ID you want to process. This component should fetch the Google Doc content as HTML and extract the content between the specified markers while preserving the formatting.

user-1 · October 18, 2023, 4:15am

I am getting the following error when running the code:

Identifier '$' has already been declared (28:10)

user-1 · October 18, 2023, 4:15am

I apologize for the confusion in my previous response. It seems there was an issue with the code I provided. Let’s try this modified version of the code: import { axios } from "@pipedream/platform"; import cheerio from "cheerio"; export default defineComponent({ props: { google: { type: "app", app: "google", }, documentId: { type: "string", label: "Google Document ID", }, }, async run({ $ }) { const response = await axios(this, { method: "GET", url: `https://docs.googleapis.com/v1/documents/${this.documentId}/export`, headers: { Authorization: `Bearer ${this.google.$auth.oauth_access_token}`, }, params: { mimeType: "text/html", }, responseType: "text", }); const htmlContent = response.data; const $ = cheerio.load(htmlContent); const startIndex = $("body").html().indexOf("==== WORDPRESS START ===") + "==== WORDPRESS START ===".length; const endIndex = $("body").html().indexOf("==== WORDPRESS END ==="); const extractedContent = $("body").html().slice(startIndex, endIndex); return extractedContent; }, }); This code should fetch the Google Doc content as HTML and extract the content between the specified markers while preserving the formatting. Please replace this.documentId with the actual Google Document ID you want to process.

user-1 · October 18, 2023, 4:15am

This code is not working:

import { axios } from "@pipedream/platform";
import cheerio from "cheerio";

export default defineComponent({
  props: {
    google: {
      type: "app",
      app: "google",
    },
    documentId: {
      type: "string",
      label: "Google Document ID",
    },
  },
  async run({ $ }) {
    const response = await axios(this, {
      method: "GET",
      url: `https://docs.googleapis.com/v1/documents/${this.documentId}/export`,
      headers: {
        Authorization: `Bearer ${this.google.$auth.oauth_access_token}`,
      },
      params: {
        mimeType: "text/html",
      },
      responseType: "text",
    });

    const htmlContent = response.data;
    const $ = cheerio.load(htmlContent);

    const startIndex = $("body").html().indexOf("==== WORDPRESS START ===") + "==== WORDPRESS START ===".length;
    const endIndex = $("body").html().indexOf("==== WORDPRESS END ===");

    const extractedContent = $("body").html().slice(startIndex, endIndex);

    return extractedContent;
  },
});

I’m getting the error:

Code Error
Identifier ‘$’ has already been declared (29:10)
DETAILS

SyntaxError: Identifier '$' has already been declared (29:10)
    at pp$4.raise (/var/task/node_modules/acorn/dist/acorn.js:2927:15)
    at pp$5.declareName (/var/task/node_modules/acorn/dist/acorn.js:3001:28)
    at pp$2.checkLVal (/var/task/node_modules/acorn/dist/acorn.js:1840:77)
    at pp$1.parseVarId (/var/task/node_modules/acorn/dist/acorn.js:1255:10)
    at pp$1.parseVar (/var/task/node_modules/acorn/dist/acorn.js:1237:12)
    at pp$1.parseVarStatement (/var/task/node_modules/acorn/dist/acorn.js:1101:10)
    at pp$1.parseStatement (/var/task/node_modules/acorn/dist/acorn.js:851:19)
    at anonymous.parseStatement (/var/task/node_modules/acorn-node/lib/dynamic-import/index.js:65:118)
    at anonymous.parseStatement (/var/task/node_modules/acorn-node/lib/import-meta/index.js:45:48)
    at pp$1.parseBlock (/var/task/node_modules/acorn/dist/acorn.js:1170:23)

user-1 · October 18, 2023, 4:15am

I apologize for the confusion. The issue is caused by the naming conflict between the $ parameter in the run method and the $ variable used for Cheerio. Let’s modify the code to avoid this conflict: import { axios } from "@pipedream/platform"; import cheerio from "cheerio"; export default defineComponent({ props: { google: { type: "app", app: "google", }, documentId: { type: "string", label: "Google Document ID", }, }, async run() { const response = await axios(this, { method: "GET", url: `https://docs.googleapis.com/v1/documents/${this.documentId}/export`, headers: { Authorization: `Bearer ${this.google.$auth.oauth_access_token}`, }, params: { mimeType: "text/html", }, responseType: "text", }); const htmlContent = response.data; const cheerio$ = cheerio.load(htmlContent); const startIndex = cheerio$("body").html().indexOf("==== WORDPRESS START ===") + "==== WORDPRESS START ===".length; const endIndex = cheerio$("body").html().indexOf("==== WORDPRESS END ==="); const extractedContent = cheerio$("body").html().slice(startIndex, endIndex); return extractedContent; }, }); In this modified version, I changed the $ variable used for Cheerio to cheerio$ to avoid the naming conflict. Please try this updated code and let me know if it resolves the issue.

user-1 · October 18, 2023, 4:15am

@UMT4G7E5P @U02SX7ETBFB @U02A06JPKEH

You’re written this code several times and each time there is an error.

Here is the most recent error message.

Error - Request failed with status code 404
"!DOCTYPE html\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n **{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}** > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That's an error.</ins>\n <p>The requested URL <code>/v1/documents/1x83iPJ0bjNfVYmRsue-UDqjx_YQFDH_7cDfPoFmoihY/export?mimeType=text%2Fhtml</code> was not found on this server. <ins>That's all we know.</ins>\n"
DETAILS
    at null.createError (/tmp/__pdg__/dist/code/042e847f6e7c7ca7097d24e804142c62b41c460b6ee4a0d2a914569a64c64742/node_modules/.pnpm/axios@0.21.4/node_modules/axios/lib/core/createError.js:16:15)
    at null.settle (/tmp/__pdg__/dist/code/042e847f6e7c7ca7097d24e804142c62b41c460b6ee4a0d2a914569a64c64742/node_modules/.pnpm/axios@0.21.4/node_modules/axios/lib/core/settle.js:17:12)
    at IncomingMessage.handleStreamEnd (/tmp/__pdg__/dist/code/042e847f6e7c7ca7097d24e804142c62b41c460b6ee4a0d2a914569a64c64742/node_modules/.pnpm/axios@0.21.4/node_modules/axios/lib/adapters/http.js:269:11)
    at IncomingMessage.emit (node:events:529:35)
    at null.endReadableNT (node:internal/streams/readable:1368:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

user-1 · October 18, 2023, 4:15am

I’m sorry to hear that you’re encountering an error. Based on the error message, it seems like the requested URL was not found on the server, resulting in a 404 error. To help you further, I’ll need more information about the code you’re running and the API you’re trying to access. Can you please provide more details about your code and the API endpoint you’re trying to call?

user-1 · October 18, 2023, 4:15am

Here is the code that is not working:

import { axios } from "@pipedream/platform";
import cheerio from "cheerio";

export default defineComponent({
  props: {
    google: {
      type: "app",
      app: "google",
    },
    documentId: {
      type: "string",
      label: "Google Document ID",
    },
  },
  async run() {
    const response = await axios(this, {
      method: "GET",
      url: `https://docs.googleapis.com/v1/documents/${this.documentId}/export`,
      headers: {
        Authorization: `Bearer ${this.google.$auth.oauth_access_token}`,
      },
      params: {
        mimeType: "text/html",
      },
      responseType: "text",
    });

    const htmlContent = response.data;
    const cheerio$ = cheerio.load(htmlContent);

    const startIndex = cheerio$("body").html().indexOf("==== WORDPRESS START ===") + "==== WORDPRESS START ===".length;
    const endIndex = cheerio$("body").html().indexOf("==== WORDPRESS END ===");

    const extractedContent = cheerio$("body").html().slice(startIndex, endIndex);

    return extractedContent;
  },
});

user-1 · October 18, 2023, 4:15am

I noticed an issue in your code related to the URL you’re trying to access. The angle brackets < and > around the URL should be removed. Please update the url property in the axios call like this: url: `https://docs.googleapis.com/v1/documents/${this.documentId}/export`, Here’s the corrected code: ```
import { axios } from “@pipedream/platform”; import cheerio from “cheerio”; export default defineComponent({ props: { google: { type: “app”, app: “google”, }, documentId: { type: “string”, label: “Google Document ID”, }, }, async run() { const response = await axios(this, { method: “GET”, url: https://docs.googleapis.com/v1/documents/${this.documentId}/export, headers: { Authorization: Bearer ${this.google.$auth.oauth_access_token}, }, params: { mimeType: “text/html”, }, responseType: “text”, }); const htmlContent = response.data; const cheerio$ = cheerio.load(htmlContent); const startIndex = cheerio$(“body”).html().indexOf(“==== WORDPRESS START ===”) + “==== WORDPRESS START ===”.length; const endIndex = cheerio$(“body”).html().indexOf(“==== WORDPRESS END ===”); const extractedContent = cheerio$(“body”).html().slice(startIndex, endIndex); return extractedContent; }, });

user-1 · October 18, 2023, 4:15am

There is something wrong with the code that you gave me.

I was able to run the GOOGLE DOCS GET DOCUMENT and it was able to pull the document just fine.

So, I’d like to change our method. Now that I have the document using the GOOGLE DOCS GET DOCUMENT, is there a way to only work with the content in between the ==== WORDPRESS START === and ==== WORDPRESS END === area?

user-1 · October 18, 2023, 4:15am

I understand that you want to extract the content between “==== WORDPRESS START ===” and “==== WORDPRESS END ===” from the Google Document. Since you’ve successfully used the GOOGLE DOCS GET DOCUMENT, you can modify the code to extract the desired content. Here’s an example: ```
import { axios } from “@pipedream/platform”; export default defineComponent({ props: { google: { type: “app”, app: “google”, }, documentId: { type: “string”, label: “Google Document ID”, }, }, async run() { const response = await axios(this, { method: “GET”, url: https://docs.googleapis.com/v1/documents/${this.documentId}, headers: { Authorization: Bearer ${this.google.$auth.oauth_access_token}, }, }); const documentContent = response.data.body.content; let startIndex = -1; let endIndex = -1; for (let i = 0; i < documentContent.length; i++) { const element = documentContent[i]; if (element.paragraph && element.paragraph.elements) { const textRun = element.paragraph.elements[0].textRun; if (textRun && textRun.content) { if (textRun.content.includes(“==== WORDPRESS START ===”)) { startIndex = i; } else if (textRun.content.includes(“==== WORDPRESS END ===”)) { endIndex = i; break; } } } } if (startIndex !== -1 && endIndex !== -1) { const extractedContent = documentContent.slice(startIndex + 1, endIndex); return extractedContent; } else { return “Unable to find the specified content markers.”; } }, });

user-1 · October 18, 2023, 4:15am

I don’t want to modify the code. I would prefer to make a new code which would take the output from GOOGLE DOCS GET DOCUMENT (steps.get_document) and sift thru that data.

user-1 · October 18, 2023, 4:15am

I understand that you want to process the output from the GOOGLE DOCS GET DOCUMENT step (steps.get_document). Here’s a code snippet that takes the output from the previous step and extracts the content between “==== WORDPRESS START ===” and “==== WORDPRESS END ===”: ```
export default defineComponent({ async run({ steps }) { const documentContent = steps.get_document.data.body.content; let startIndex = -1; let endIndex = -1; for (let i = 0; i < documentContent.length; i++) { const element = documentContent[i]; if (element.paragraph && element.paragraph.elements) { const textRun = element.paragraph.elements[0].textRun; if (textRun && textRun.content) { if (textRun.content.includes(“==== WORDPRESS START ===”)) { startIndex = i; } else if (textRun.content.includes(“==== WORDPRESS END ===”)) { endIndex = i; break; } } } } if (startIndex !== -1 && endIndex !== -1) { const extractedContent = documentContent.slice(startIndex + 1, endIndex); return extractedContent; } else { return “Unable to find the specified content markers.”; } }, });

user-1 · October 18, 2023, 4:15am

The code that you gave me:

export default defineComponent({
  async run({ steps }) {
    const documentContent = steps.get_document.data.body.content;
    let startIndex = -1;
    let endIndex = -1;

    for (let i = 0; i < documentContent.length; i++) {
      const element = documentContent[i];
      if (element.paragraph && element.paragraph.elements) {
        const textRun = element.paragraph.elements[0].textRun;
        if (textRun && textRun.content) {
          if (textRun.content.includes("==== WORDPRESS START ===")) {
            startIndex = i;
          } else if (textRun.content.includes("==== WORDPRESS END ===")) {
            endIndex = i;
            break;
          }
        }
      }
    }

    if (startIndex !== -1 && endIndex !== -1) {
      const extractedContent = documentContent.slice(startIndex + 1, endIndex);
      return extractedContent;
    } else {
      return "Unable to find the specified content markers.";
    }
  },
});

Produced this error:

{"$return_value":"Unable to find the specified content markers."}

user-1 · October 18, 2023, 4:15am

I apologize for the confusion. It seems that the code I provided was unable to find the “==== WORDPRESS START ===” and “==== WORDPRESS END ===” markers in the document content. To better assist you, could you please provide a sample of the Google Document content or its structure? This will help me understand the format and adjust the code accordingly.