What is the Most Effective Way to Upload 100k Records into a Datastore?

user-1 · October 27, 2023, 2:25am

This topic was automatically generated from Slack. You can find the original thread here.

We’re having a really hard time uploading 100k records into a datastore. It’s a list of zip codes. Can someone please lend us a hand in the most effective way to get this uploaded? @U04EQQMEAMS

user-1 · October 27, 2023, 2:25am

when we try to upload the records in bulk or broken up in chunks often it will crash part way and then we can’t save state very easily. From what I can tell in the documentation there’s no way to upload the file directly. The file is only 6 MBs so we’re not crazy

user-1 · October 27, 2023, 2:25am

Hi , could you share your code that updates the data store and the specific error you’re getting?

You’re correct that there’s not a native file upload for data stores but I agree that would be great to see. At the very least I think you should be able to clone a workflow to do it for common file formats like CSVs and JSON.

user-1 · October 27, 2023, 2:25am

this is our code
**def** handler(pd: "pipedream"):
# Access the data store under the pd.inputs
data_store = pd.inputs["data_store"]

`**for** item **in** pd.steps["trigger"]["event"]["body"]:`
    _`# Store a value under a key`_
    `data_store[item["key"]] = item["value"]`

    _`# Retrieve the value and print it to the step's Logs`_
    `**print**(data_store[item["key"]])`

This is the error:
KeyError

Traceback (most recent call last):

  File "/nano-py/pipedream/worker.py", line 118, in execute
    user_retval = handler(pd)

  File "/pipedream/dist/code/58ecef3d76943ec5c6bca4ce78a6d08903b900488911d6945eebe52b7d03784b/code.py", line 10, in handler
    print(data_store[item["key"]])

  File "/nano-py/pipedream/pipedream.py", line 237, in __getitem__
    raise KeyError

KeyError

(python)

We of course could throw in try/except, but I think the question is more about why we need that for something that seems so simple?

user-1 · October 27, 2023, 2:25am

Thanks. First I’d like to see how our support bot does on this, and I’ll share some more detail / docs to help you troubleshoot

user-1 · October 27, 2023, 2:25am

I understand that you’re facing issues with uploading 100k records into a datastore and encountering a KeyError. Based on the code and error you’ve provided, it seems that there might be an issue with the keys in the items you’re processing. To help you troubleshoot this issue, I would recommend adding some error handling and logging to your code. This will help you identify the specific item causing the issue. Here’s an example of how you can modify your code: ```
def handler(pd: “pipedream”): data_store = pd.inputs[“data_store”] for item in pd.steps[“trigger”][“event”][“body”]: try: data_store[item[“key”]] = item[“value”] print(data_store[item[“key”]]) except KeyError: print(f"Error processing item: {item}")

user-1 · October 27, 2023, 2:25am

You may have also seen, but there’s also probably not a key field in the item that you’re iterating over if you’re hitting a KeyError. So you can handle the absence of that. @U05A7KLNALC How do I best handle the absence of the key field in item?

user-1 · October 27, 2023, 2:25am

To handle the absence of the key field in item, you can use a conditional check to ensure the key field exists before processing the item. Here’s an example of how you can modify your code to handle this case: def handler(pd: "pipedream"): data_store = pd.inputs["data_store"] for item in pd.steps["trigger"]["event"]["body"]: if "key" in item and "value" in item: data_store[item["key"]] = item["value"] print(data_store[item["key"]]) else: print(f"Error processing item: {item}, missing 'key' or 'value' field") This code checks if both the key and value fields are present in the item before processing it. If either field is missing, it prints an error message with the problematic item.

user-1 · October 27, 2023, 2:25am

Let me know if all that helps

user-1 · October 27, 2023, 2:25am

If i just rerun it it will work with no edits

user-1 · October 27, 2023, 2:25am

but like i guess what im showing here is that if i rerun it with no errors then why didn’t it run initially with no errors?

user-1 · October 27, 2023, 2:25am

I don’t know — did you happen to select a different event from the trigger step?

The KeyError from that specific code should have thrown because of the absence of a key field in item and doesn’t look like it’s an error with our data store API. But if you observe something similar and can confirm that the item in question indeed does have a key, but we’re still throwing this error, definitely let me know.

user-1 · October 27, 2023, 2:25am

It errored out on item value 85363 which has a key

user-1 · October 27, 2023, 2:25am

here are the value print outs up until 85363 when it errored

user-1 · October 27, 2023, 2:25am

thanks, it certainly may be some issue internally. Could you copy the full key here and I can see if we have logs on the PUT?

user-1 · October 27, 2023, 2:25am

f205ac30bcd736097eb07e55060d75f2

user-1 · October 27, 2023, 2:25am

Thanks. I do see an error from our load balancer updating that record on that one attempt. But I don’t see any issue with the data store handling that load / no errors on our end. But if you see it again with large updates, please let me know.

user-1 · October 27, 2023, 2:25am

okay thank you