Why am I getting a Python install error related to google-cloud-datatransfer and directory permissions?

This topic was automatically generated from Slack. You can find the original thread here.

Not sure if this is something I did wrong or an error with the Python action:

Error
Command failed: python3 -m pip install --target=/tmp/__pdg__/dist/python -r requirements.txt --upgrade WARNING: The directory '/home/sbx_user1051/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag. ERROR: Could not find a version that satisfies the requirement google-cloud-datatransfer (from versions: none) ERROR: No matching distribution found for google-cloud-datatransfer [notice] A new release of pip is available: 23.0.1 -> 23.2.1 [notice] To update, run: pip install --upgrade pip

Hi , I believe this is because google-cloud-datatransfer could not be installed on Pipedream environment. Per the serverless nature of Pipedream, some libraries might now be compatible, and this library is one of them.

May I ask if you could find another library, or what’s your usecase for the library? Maybe Pipedream can implement an action for you

Hi ,

Thank you for the response!

I’m looking to create a scheduled query using a service account.

Here’s the link to the documentation.

Thanks , I created the ticket here for Pipedream to implement the action for you.

For now, I’ve checked GCP API docs and see that it is possible to use their APIs instead of their python library. Would you mind to give it a try on your code step?

Leo, thank you so much. The API method does work, but the authentication is hard. How do I get the API key and the access token?

I think for this Pi can help you, I’ll ask it for you

Please write a Python code step that use Google Cloud app and Google Cloud Bigquery API to create a scheduled query: Scheduling queries  |  BigQuery  |  Google Cloud

To create a scheduled query using the Google Cloud BigQuery API, you can use the following Python code step in your Pipedream workflow. Make sure to replace YOUR_PROJECT_ID, YOUR_DATASET_ID, and YOUR_TABLE_ID with the appropriate values for your use case. import json import requests from google.oauth2 import service_account # Replace with your own values project_id = "YOUR_PROJECT_ID" dataset_id = "YOUR_DATASET_ID" table_id = "YOUR_TABLE_ID" # Set up Google Cloud authentication credentials = service_account.Credentials.from_service_account_info(pd.secret.google_cloud) scoped_credentials = credentials.with_scopes(["https://www.googleapis.com/auth/bigquery"]) # Get the access token access_token = scoped_credentials.get_access_token().access_token # Define the scheduled query configuration scheduled_query = { "scheduleOptions": { "startTime": "2023-01-01T00:00:00Z", "endTime": "2023-12-31T00:00:00Z", "timeZone": "UTC" }, "query": f"SELECT * FROM `{project_id}.{dataset_id}.{table_id}`", "destinationTable": { "projectId": project_id, "datasetId": dataset_id, "tableId": f"{table_id}_scheduled" }, "writeDisposition": "WRITE_TRUNCATE", "schedule": "every 24 hours" } # Make the API request to create the scheduled query url = f"https://bigquerydatatransfer.googleapis.com/v1/projects/{project_id}/locations/us/transferConfigs" headers = { "Authorization": f"Bearer {access_token}", "Content-Type": "application/json" } response = [requests.post](http://requests.post)(url, headers=headers, data=json.dumps(scheduled_query)) # Check the response and return the result if response.status_code == 200: result = response.json() print(f"Scheduled query created: {result}") else: print(f"Error creating scheduled query: {response.text}") This code assumes you have a Google Cloud service account JSON key stored as a Pipedream secret named google_cloud. If you haven’t done that, please follow the instructions in the Pipedream Google Cloud integration docs to set up the authentication.

I think the code is quite broken

Please don’t use it

LOL. You have no Idea how much an expert eye means. I would have wasted hours trying it out.

Is it possible to use Node.js code step? I think is better with Node.js

I would prefer all Node.js. I thought I had to use Python.

Yes, you can use Node.js code steps in Pipedream workflows. Here’s an example of how you can create a scheduled query using the Google BigQuery API with a Node.js code step. Make sure to replace YOUR_PROJECT_ID, YOUR_DATASET_ID, and YOUR_TABLE_ID with the appropriate values for your use case. import { axios } from "@pipedream/platform"; import { google } from "googleapis"; export default defineComponent({ props: { google: { type: "app", app: "google", }, projectId: "YOUR_PROJECT_ID", datasetId: "YOUR_DATASET_ID", tableId: "YOUR_TABLE_ID", }, async run({ $ }) { const auth = new google.auth.GoogleAuth({ credentials: this.google.$auth, scopes: ["https://www.googleapis.com/auth/bigquery"], }); const authClient = await auth.getClient(); const bigquery = google.bigquery({ version: "v2", auth: authClient }); const scheduledQuery = { configuration: { query: { query: `SELECT ** FROM \`${this.projectId}.${this.datasetId}.${this.tableId}\``, destinationTable: { projectId: this.projectId, datasetId: this.datasetId, tableId: `${this.tableId}_scheduled`, }, writeDisposition: "WRITE_TRUNCATE", }, schedule: { startTime: "2023-01-01T00:00:00Z", endTime: "2023-12-31T00:00:00Z", timeZone: "UTC", every: 24 ** 60 ** 60 ** 1000, // every 24 hours in milliseconds }, }, }; const response = await bigquery.jobs.insert({ projectId: this.projectId, resource: scheduledQuery, }); if (response.status === 200) { console.log(`Scheduled query created: ${JSON.stringify(response.data)}`); } else { console.log(`Error creating scheduled query: ${response.statusText}`); } }, }); This code assumes you have a Google Cloud service account JSON key stored as a Pipedream secret named google_cloud. If you haven’t done that, please follow the instructions in the Pipedream Google Cloud integration docs to set up the authentication.

Please write a Node.js code step that use Pipedream Google Cloud app and Google Cloud Bigquery API to create a scheduled query: Scheduling queries  |  BigQuery  |  Google Cloud

Here’s a Node.js code step that uses the Pipedream Google Cloud app and Google Cloud BigQuery API to create a scheduled query. Make sure to replace YOUR_PROJECT_ID, YOUR_DATASET_ID, and YOUR_TABLE_ID with the appropriate values for your use case. import { axios } from "@pipedream/platform"; import { google } from "googleapis"; export default defineComponent({ props: { google: { type: "app", app: "google", }, projectId: "YOUR_PROJECT_ID", datasetId: "YOUR_DATASET_ID", tableId: "YOUR_TABLE_ID", }, async run({ $ }) { const auth = new google.auth.GoogleAuth({ credentials: this.google.$auth, scopes: ["https://www.googleapis.com/auth/bigquery"], }); const authClient = await auth.getClient(); const bigquery = google.bigquery({ version: "v2", auth: authClient }); const scheduledQuery = { configuration: { query: { query: `SELECT ** FROM \`${this.projectId}.${this.datasetId}.${this.tableId}\``, destinationTable: { projectId: this.projectId, datasetId: this.datasetId, tableId: `${this.tableId}_scheduled`, }, writeDisposition: "WRITE_TRUNCATE", }, schedule: { startTime: "2023-01-01T00:00:00Z", endTime: "2023-12-31T00:00:00Z", timeZone: "UTC", every: 24 ** 60 ** 60 ** 1000, // every 24 hours in milliseconds }, }, }; const response = await bigquery.jobs.insert({ projectId: this.projectId, resource: scheduledQuery, }); if (response.status === 200) { console.log(`Scheduled query created: ${JSON.stringify(response.data)}`); } else { console.log(`Error creating scheduled query: ${response.statusText}`); } }, }); This code assumes you have a Google Cloud service account JSON key stored as a Pipedream secret named google_cloud. If you haven’t done that, please follow the instructions in the Pipedream Google Cloud integration docs to set up the authentication.

Oh I think Pi answered it already here. Checking the code I think it quite make sense (though I’ve not tested it), would you mind trying it out? You can ask Pi for any error you faced when executing the code

Thank you so much! Going to try it now :slightly_smiling_face:

I’m getting an error in this code:

Code Error
unsupported type for prop projectId: my-project-id

It looks like the projectId prop is missing the type property. You should define the type for the projectId prop as “string”. Here’s the updated code with the correct prop definition: import { axios } from "@pipedream/platform"; import { google } from "googleapis"; export default defineComponent({ props: { google: { type: "app", app: "google", }, projectId: { type: "string", default: "YOUR_PROJECT_ID", }, datasetId: { type: "string", default: "YOUR_DATASET_ID", }, tableId: { type: "string", default: "YOUR_TABLE_ID", }, }, async run({ $ }) { const auth = new google.auth.GoogleAuth({ credentials: this.google.$auth, scopes: ["https://www.googleapis.com/auth/bigquery"], }); const authClient = await auth.getClient(); const bigquery = google.bigquery({ version: "v2", auth: authClient }); const scheduledQuery = { configuration: { query: { query: `SELECT ** FROM \`${this.projectId}.${this.datasetId}.${this.tableId}\``, destinationTable: { projectId: this.projectId, datasetId: this.datasetId, tableId: `${this.tableId}_scheduled`, }, writeDisposition: "WRITE_TRUNCATE", }, schedule: { startTime: "2023-01-01T00:00:00Z", endTime: "2023-12-31T00:00:00Z", timeZone: "UTC", every: 24 ** 60 ** 60 ** 1000, // every 24 hours in milliseconds }, }, }; const response = await bigquery.jobs.insert({ projectId: this.projectId, resource: scheduledQuery, }); if (response.status === 200) { console.log(`Scheduled query created: ${JSON.stringify(response.data)}`); } else { console.log(`Error creating scheduled query: ${response.statusText}`); } }, }); Make sure to replace YOUR_PROJECT_ID, YOUR_DATASET_ID, and YOUR_TABLE_ID with the appropriate values for your use case.