How to reference a variable when using youtube-transcript-api in Python to retrieve and store a YouTube video transcript?

user-1 · December 7, 2023, 10:10am

This topic was automatically generated from Slack. You can find the original thread here.

I’m using youtube-transcript-api in Python to retrieve a transcript of a YouTube video. I need to store it in a data store that can be used in the next step for further processing.
However I currently can’t see/reference the variable.
How can I fix this?

Here’s my custom Pythod code:
from youtube_transcript_api import YouTubeTranscriptApi
import json

# Replace ‘Zh0b6YTvVP4’ with the actual video ID of the YouTube video
video_id = ‘Zh0b6YTvVP4’

try:
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
transcript = transcript_list.find_generated_transcript([‘en’]).fetch()
# Convert transcript to a JSON string
transcript_json = json.dumps(transcript)
except Exception as e:
print(“Error retrieving transcript:”, e)
transcript_json = json.dumps({“error”: str(e)})

# Store the transcript in a variable that Pipedream recognizes for passing data
return_value = transcript_json

user-1 · December 7, 2023, 10:10am

To store the transcript in a Pipedream data store for use in the next step of your workflow, you’ll need to add a data store as an input to your Python code step and then use the pd.inputs["data_store"] to interact with it. Here’s how you can modify your Python code to store the transcript: ```
from youtube_transcript_api import YouTubeTranscriptApi import json def handler(pd: “pipedream”): video_id = ‘Zh0b6YTvVP4’ data_store = pd.inputs[“data_store”] try: transcript_list = YouTubeTranscriptApi.list_transcripts(video_id) transcript = transcript_list.find_generated_transcript([‘en’]).fetch() transcript_json = json.dumps(transcript) data_store[“transcript”] = transcript_json except Exception as e: print(“Error retrieving transcript:”, e) data_store[“transcript”] = json.dumps({“error”: str(e)}) # You don’t need to return anything since you’re storing the data in the data store

user-1 · December 7, 2023, 10:10am

so the data store should be called “transcript”?

user-1 · December 7, 2023, 10:10am

Based on the context provided, it seems like you are asking about naming a data store in Pipedream. If you are setting up a data store to hold transcripts, naming it “transcript” would be appropriate as it clearly indicates the type of data it will contain. Here’s an example of how you might define a data store for transcripts in a Pipedream component: export default defineComponent({ props: { transcriptDataStore: { type: "data_store" }, }, async run({ steps, $ }) { // Example usage: saving a transcript to the data store await this.transcriptDataStore.set('latestTranscript', 'Your transcript data here'); // Example usage: retrieving a transcript from the data store const latestTranscript = await this.transcriptDataStore.get('latestTranscript'); // Do something with the transcript }, }); In this example, the data store is named transcriptDataStore, which is descriptive of its purpose. Remember to choose a name that makes sense for your specific use case and helps you easily identify the data store’s contents.

user-1 · December 7, 2023, 10:10am

this is my current custom Python code. Is it storing the YouTube video transcript in a data store called “transcript”?

As I can’t retrieve the data store in subsequent steps (it doesn’t find it)

from youtube_transcript_api import YouTubeTranscriptApi
import json

def handler(pd: “pipedream”):
video_id = ‘Zh0b6YTvVP4’
data_store = pd.inputs[“data_store”]

**try**:
    transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
    transcript = transcript_list.find_generated_transcript(['en']).fetch()
    transcript_json = json.dumps(transcript)
    data_store["transcript"] = transcript_json
**except** Exception **as** e:
    **print**("Error retrieving transcript:", e)
    data_store["transcript"] = json.dumps({"error": **str**(e)})

_# You don't need to return anything since you're storing the data in the data store_

user-1 · December 7, 2023, 10:10am

Based on the Python code you’ve provided, it appears that you are attempting to store the YouTube video transcript in a data store named “transcript”. However, there are a couple of points to consider: 1. The code uses pd.inputs["data_store"] to reference the data store, which suggests that you have defined a data store input in your Pipedream component. Ensure that this input is correctly configured in your component’s properties. 2. The code correctly attempts to save the transcript to the data store using data_store["transcript"] = transcript_json. This should work provided that the data store is correctly set up and accessible. 3. If you’re having trouble retrieving the data from the data store in subsequent steps, make sure that you are referencing the data store by the correct key and that the data store is not being cleared or overwritten by another step. If you continue to face issues, please ensure that the data store is correctly configured in your workflow and that you are accessing it properly in subsequent steps. If you need further assistance, you can visit Pipedream’s support page for more help.

user-1 · December 7, 2023, 10:10am

I have this in the first YouTube api step

resourceId{2}
kind:
youtube#video
videoId:
8aOf2nSLdUA

I need to use the videoId dynamically in the next step, which is the custom Python code.

from youtube_transcript_api import YouTubeTranscriptApi
import json

def handler(pd: “pipedream”):
video_id = ‘Zh0b6YTvVP4’
data_store = pd.inputs[“data_store”]

try:
    transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
    transcript = transcript_list.find_generated_transcript(['en']).fetch()
    transcript_json = json.dumps(transcript)
    data_store["transcript"] = transcript_json
except Exception as e:
    print("Error retrieving transcript:", e)
    data_store["transcript"] = json.dumps({"error": str(e)})

# You don't need to return anything since you're storing the data in the data store

So instead of video_ID = specific ID, it should look something like this:
video_ID = videoId (this is the variable from the previous step)

user-1 · December 7, 2023, 10:10am

To use the videoId from the previous step dynamically in your Python code, you’ll need to access the output of that step using the steps object provided by Pipedream. Here’s how you can modify your Python code to use the videoId from the previous step: from youtube_transcript_api import YouTubeTranscriptApi import json def handler(pd: "pipedream"): # Retrieve the videoId from the previous step's output video_id = pd.steps["previous_step_name"].output["resourceId"]["videoId"] data_store = pd.inputs["data_store"] try: transcript_list = YouTubeTranscriptApi.list_transcripts(video_id) transcript = transcript_list.find_generated_transcript(['en']).fetch() transcript_json = json.dumps(transcript) data_store["transcript"] = transcript_json except Exception as e: print("Error retrieving transcript:", e) data_store["transcript"] = json.dumps({"error": str(e)}) # You don't need to return anything since you're storing the data in the data store Please replace previous_step_name with the actual name of the step that outputs the videoId. In Pipedream, each step has a name, and you can reference the outputs of that step using pd.steps["step_name"].output. If you’re not sure of the step name, you can find it in the Pipedream UI where you configure your workflow steps.

user-1 · December 7, 2023, 10:10am

The following code gives me this error.

Traceback (most recent call last):

  File "/nano-py/pipedream/worker.py", line 118, in execute
    user_retval = handler(pd)

  File "/tmp/__pdg__/dist/code/b10935c9bd4697d82ac58a846ecb1fac9500f66e026d38775b52e0a9b49bfffc/code.py", line 6, in handler
    video_id = pd.steps["trigger"].output["resourceId"]["videoId"]

AttributeError: 'dict' object has no attribute 'output'

from youtube_transcript_api import YouTubeTranscriptApi
import json

def handler(pd: “pipedream”):
# Retrieve the videoId from the previous step’s output
video_id = pd.steps[“trigger”].output[“resourceId”][“videoId”]
data_store = pd.inputs[“data_store”]

**try**:
    transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
    transcript = transcript_list.find_generated_transcript(['en']).fetch()
    transcript_json = json.dumps(transcript)
    data_store["transcript"] = transcript_json
**except** Exception **as** e:
    **print**("Error retrieving transcript:", e)
    data_store["transcript"] = json.dumps({"error": **str**(e)})

_# You don't need to return anything since you're storing the data in the data store_

user-1 · December 7, 2023, 10:10am

It looks like you’re trying to access the output of a previous step in a Pipedream Python code step using pd.steps["trigger"].output["resourceId"]["videoId"], but you’re encountering an AttributeError because pd.steps["trigger"] is a dictionary and doesn’t have an output attribute. In Pipedream’s Python environment, the data passed between steps is accessed through the pd object, but the syntax is different from what you’ve used. The correct way to access step exports in Python is through the pd.steps dictionary with the step name as the key, and then directly accessing the exported data. Here’s how you can access the videoId if it was exported from a previous step named trigger: video_id = pd.steps["trigger"]["resourceId"]["videoId"] Please replace "trigger" with the actual name of the step that contains the videoId you want to access. If you’re unsure of the step name or the structure of the exported data, you can log the pd.steps object to inspect it: print(pd.steps) This will output the structure of the pd.steps object, allowing you to see the names of the steps and the data they’ve exported, so you can adjust your code accordingly.

user-1 · December 7, 2023, 10:10am

What Notion step do I use if I want to add a page to an already existing database?

user-1 · December 7, 2023, 10:10am

To add a page to an existing Notion database, you can use the “Add a database page” action in Pipedream. Here’s how you can do it: 1. Search for the Notion app in the Pipedream UI. 2. Look for the “Add a database page” action. This action allows you to create a new page within a specific database in Notion. You’ll need to provide the necessary information such as the database ID and the properties for the page you want to create. If you need further assistance or have any other questions, feel free to visit Pipedream’s support page.