How to Write Python Code for Processing Beats and Lyrics Using Pipedream?

user-1 · May 29, 2024, 12:29pm

This topic was automatically generated from Slack. You can find the original thread here.

you’re so good at your work.

I am now writing my first ever Python step in Pipedream - youre going to have to hold my hand!!

Here’s the code:

import json
import pandas as pd
import xlsxwriter # Add this import
pd.set_option(‘display.max_rows’, None)
# Load the JSON data for beats and lyrics
with open(“beatMap.json”, “r”) as f:
beats = json.load(f)
with open(“lyrics.json”, “r”) as f:
lyrics = json.load(f)

# Preprocess lyrics to add midpoints and calculate durations for each word
for lyric in lyrics:
for word in lyric[‘words’]:
word[‘midpoint’] = (word[‘start’] + word[‘end’]) / 2
word[‘duration’] = word[‘end’] - word[‘start’]

# Group beats into bars and format beat dictionaries correctly
bars = {}
bar_times =
current_bar = 1
for i, beat in enumerate(beats):
if (i % 4) == 0 and i != 0:
current_bar += 1
if current_bar not in bars:
bars[current_bar] =
bar_times.append(beat[‘time’])

_# Initialize beat dictionary with 'start', 'end', and 'lyric' keys_
beat_dict = {
    'time': beat['time'],
    'start': beat['time'],
    'end': beats[i + 1]['time'] _if_ i + 1 < len(beats) _else_ None,  _# Ensure 'end' is set_
    'lyric': []
}
bars[current_bar].append(beat_dict)

def allocate_words_to_bars(bar_times, lyrics, bars):
for bar_num, beats in bars.items():
for beat in beats:
beat[‘lyric’] = # Clear lyrics to ensure fresh start

    _for_ lyric in _lyrics_:
        _for_ word in lyric['words']:
            best_fit = None
            max_overlap = 0
            _for_ beat in beats:
                _if_ beat['end'] is None or word['end'] < beat['start']:
                    _continue_
                overlap = min(beat['end'], word['end']) - max(beat['start'], word['start'])
                _if_ overlap > 0:
                    overlap_percentage = overlap / (word['end'] - word['start'])
                    _if_ word['start'] >= beat['start'] and word['start'] < beat['end']:
                        overlap_percentage += 0.5  _# Increase weight for starting alignment_
                    _if_ overlap_percentage > max_overlap:
                        max_overlap = overlap_percentage
                        best_fit = beat

            _if_ best_fit and max_overlap > 0.3:  _# Adjusted threshold_
                best_fit['lyric'].append(word['word'])

    _# Fill empty beats with their beat number_
    _for_ i, beat in enumerate(beats):
        _if_ not beat['lyric']:
            beat['lyric'].append(str(i + 1))

# Example call to this function
allocate_words_to_bars(bar_times, lyrics, bars)

# After defining bars and before creating the DataFrame
allocate_words_to_bars(bar_times, lyrics, bars)

# Prompt for the file name
file_name = input("Enter the file name for the Excel output (without extension): ") + “.xlsx”

# Create DataFrame and populate it using direct indexing instead of append
df_beats = pd.DataFrame(columns=[‘Time’, ‘Bar’, ‘1’, ‘2’, ‘3’, ‘4’])

for bar, beats in bars.items():
row = {‘Bar’: bar}
# Convert the start time of the first beat to “00:00” format
start_time = beats[0][‘start’]
minutes = int(start_time // 60)
seconds = int(start_time % 60)
row[‘Time’] = f"{minutes:02}:{seconds:02}"

_for_ i, beat in enumerate(beats):
    row[str(i+1)] = ' '.join(beat['lyric'])
df_beats.loc[len(df_beats)] = row

# Write the DataFrame to an Excel file using XlsxWriter
with pd.ExcelWriter(file_name, engine=‘xlsxwriter’) as writer:
df_beats.to_excel(writer, index=False, sheet_name=‘Beats’)
worksheet = writer.sheets[‘ScriptRabbit’]
worksheet.set_column(‘A:A’, 5, None, {‘align’: ‘center’})
worksheet.set_column(‘B:B’, 3.33, None, {‘align’: ‘center’})
worksheet.set_column(‘C:F’, 13.33)

_# Ensure numeric columns are written as numbers_
_for_ col_num, value in enumerate(df_beats.columns.values):
    _if_ value != 'Time' and value != 'Bar':
        _for_ row_num, cell_value in enumerate(df_beats[value], _start_=1):
            _if_ isinstance(cell_value, (int, float)):
                worksheet.write_number(row_num, col_num, cell_value)
            _else_:
                worksheet.write_string(row_num, col_num, cell_value)

A few things about it:

Files saved as /tmp/lyrics.json and /tmp/beatMap.json

Re:

# Prompt for the file name
file_name = input("Enter the file name for the Excel output (without extension): ") + “.xlsx”

I want to use “steps.Moisis_Pause_For_Processing_Compeptition.jobDetails.name” - however, this will be an audio file name, for example “Crash Charli XCX.mp3” - so I need to disregard the “.mp3”

user-1 · May 29, 2024, 12:29pm

It looks like you’re doing some advanced data processing with Python in Pipedream! Let’s address your requirements step by step. Firstly, since you’re working within a Pipedream Python step, the way you access variables and interact with the filesystem is a bit different than standard Python scripts. Also, considering the Pipedream environment, writing to the /tmp directory is correct, and modifying the file name based on steps data from a previous step is a common task. Given your need to use "steps.Moisis_Pause_For_Processing_Compeptition.jobDetails.name" as the file name and remove the .mp3 extension, you can achieve this with Python’s string manipulation methods. Here’s how you can adjust your code to incorporate these changes: def handler(pd: "pipedream"): # Assuming steps.Moisis_Pause_For_Processing_Compeptition.jobDetails.name is passed correctly # and contains something like "Crash Charli XCX.mp3" audio_file_name = pd.steps["Moisis_Pause_For_Processing_Compeptition"]["jobDetails"]["name"] # Remove the ".mp3" extension base_file_name = audio_file_name.rsplit('.', 1)[0] # Append ".xlsx" to create the Excel file name file_name = f"/tmp/{base_file_name}.xlsx" # Your existing code for processing and creating the DataFrame goes here # When saving the DataFrame to an Excel file with pd.ExcelWriter(file_name, engine='xlsxwriter') as writer: df_beats.to_excel(writer, index=False, sheet_name='Beats') # Additional configuration for the worksheet as before # Return the path of the saved file or any other data you wish to pass to the next steps return {"saved_file_path": file_name} A few key points in this solution: - The .rsplit('.', 1)[0] method is used to remove the .mp3 extension from the file name. This method splits the string at the last occurrence of ., and [0] selects the first part of the split result, effectively removing the extension. - The file name is prefixed with /tmp/ to ensure it’s written to the temporary directory, which is writable in the Pipedream environment. - The pd.ExcelWriter part remains mostly unchanged,

user-1 · May 29, 2024, 12:29pm

I tried this code:

import json
import pandas as pd
import xlsxwriter

pd.set_option(‘display.max_rows’, None)

def handler(pd: “pipedream”):
# Load the JSON data for beats and lyrics
with open(“/tmp/beatMap.json”, “r”) as f:
beats = json.load(f)
with open(“/tmp/lyrics.json”, “r”) as f:
lyrics = json.load(f)

_# Preprocess lyrics to add midpoints and calculate durations for each word_
**for** lyric **in** lyrics:
    **for** word **in** lyric['words']:
        word['midpoint'] = (word['start'] + word['end']) / 2
        word['duration'] = word['end'] - word['start']

_# Group beats into bars and format beat dictionaries correctly_
bars = {}
bar_times = []
current_bar = 1
**for** i, beat **in** **enumerate**(beats):
    **if** (i % 4) == 0 **and** i != 0:
        current_bar += 1
    **if** current_bar **not** **in** bars:
        bars[current_bar] = []
        bar_times.append(beat['time'])
    _# Initialize beat dictionary with 'start', 'end', and 'lyric' keys_
    beat_dict = {
        'time': beat['time'],
        'start': beat['time'],
        'end': beats[i + 1]['time'] **if** i + 1 < **len**(beats) **else** **None**,
        'lyric': []
    }
    bars[current_bar].append(beat_dict)

**def** allocate_words_to_bars(bar_times, lyrics, bars):
    **for** bar_num, beats **in** bars.items():
        **for** beat **in** beats:
            beat['lyric'] = []  _# Clear lyrics to ensure fresh start_
        **for** lyric **in** lyrics:
            **for** word **in** lyric['words']:
                best_fit = **None**
                max_overlap = 0
                **for** beat **in** beats:
                    **if** beat['end'] **is** **None** **or** word['end'] < beat['start']:
                        **continue**
                    overlap = **min**(beat['end'], word['end']) - **max**(beat['start'], word['start'])
                    **if** overlap > 0:
                        overlap_percentage = overlap / (word['end'] - word['start'])
                        **if** word['start'] >= beat['start'] **and** word['start'] < beat['end']:
                            overlap_percentage += 0.5  _# Increase weight for starting alignment_
                        **if** overlap_percentage > max_overlap:
                            max_overlap = overlap_percentage
                            best_fit = beat
                **if** best_fit **and** max_overlap > 0.3:  _# Adjusted threshold_
                    best_fit['lyric'].append(word['word'])
        _# Fill empty beats with their beat number_
        **for** i, beat **in** **enumerate**(beats):
            **if** **not** beat['lyric']:
                beat['lyric'].append(**str**(i + 1))

_# Example call to this function_
allocate_words_to_bars(bar_times, lyrics, bars)

_# Fetch the file name from the previous step, removing the ".mp3" extension_
audio_file_name = pd.steps["Moisis_Pause_For_Processing_Compeptition"]["jobDetails"]["name"]
base_file_name = audio_file_name.rsplit('.', 1)[0]
file_name = f"/tmp/{base_file_name}.xlsx"

_# Create DataFrame and populate it using direct indexing instead of append_
df_beats = pd.DataFrame(columns=['Time', 'Bar', '1', '2', '3', '4'])
**for** bar, beats **in** bars.items():
    row = {'Bar': bar}
    _# Convert the start time of the first beat to "00:00" format_
    start_time = beats[0]['start']
    minutes = **int**(start_time // 60)
    seconds = **int**(start_time % 60)
    row['Time'] = f"{minutes:02}:{seconds:02}"
    **for** i, beat **in** **enumerate**(beats):
        row[**str**(i+1)] = ' '.join(beat['lyric'])
    df_beats.loc[**len**(df_beats)] = row

_# Write the DataFrame to an Excel file using XlsxWriter_
**with** pd.ExcelWriter(file_name, engine='xlsxwriter') **as** writer:
    [df_beats.to](http://df_beats.to)_excel(writer, index=**False**, sheet_name='Beats')
    worksheet = writer.sheets['Beats']
    worksheet.set_column('A:A', 5, **None**, {'align': 'center'})
    worksheet.set_column('B:B', 3.33, **None**, {'align': 'center'})
    worksheet.set_column('C:F', 13.33)
    _# Ensure numeric columns are written as numbers_
    **for** col_num, value **in** **enumerate**(df_beats.columns.values):
        **if** value != 'Time' **and** value != 'Bar':
            **for** row_num, cell_value **in** **enumerate**(df_beats[value], start=1):
                **if** **isinstance**(cell_value, (**int**, **float**)):
                    worksheet.write_number(row_num, col_num, cell_value)
                **else**:
                    worksheet.write_string(row_num, col_num, cell_value)

**return** {"saved_file_path": file_name}

But it couldn’t find the files int /tmp/beatMap.json

user-1 · May 29, 2024, 12:29pm

It seems like you’re encountering an issue where your Python code cannot find the specified files in the /tmp directory. This could happen for a few reasons, but most commonly, it’s because the files either do not exist at the specified location or there was an issue with the way they were saved or expected to be accessed. First, ensure that the files beatMap.json and lyrics.json were correctly saved to the /tmp directory. If you’re running this code as part of a Pipedream workflow, remember that files saved to /tmp in one step are accessible in subsequent steps, but this directory is cleared when the workflow execution is completed. This means if you’re trying to access files saved from a previous execution or from an external source, you’ll need to ensure they’re present in the /tmp directory within the same execution context. If you’re confident the files should be there, double-check the file paths and names are correct and match exactly, including case sensitivity. File paths and names in Unix-like systems, including the environment where Pipedream workflows run, are case-sensitive. Here’s a simplified example of how you might ensure a file exists in /tmp using Python code in Pipedream: import os def handler(pd: "pipedream"): # Check if the file exists beat_map_path = "/tmp/beatMap.json" lyrics_path = "/tmp/lyrics.json" if not os.path.exists(beat_map_path): return {"error": f"File not found: {beat_map_path}"} if not os.path.exists(lyrics_path): return {"error": f"File not found: {lyrics_path}"} # If both files exist, proceed with your logic return {"message": "Both files exist. Proceeding with processing."} This code snippet checks if both beatMap.json and lyrics.json exist in the /tmp directory and returns an error message if either is missing. If the files are found, it proceeds with the rest of your logic (which you would need to add in place of the final return statement). If you’re still encountering issues, consider how and where these files are being generated or uploaded to /tmp. If they’re supposed to be generated by a previous step in your workflow, ensure that step is executing correctly and actually creating the files.

user-1 · May 29, 2024, 12:29pm

This is the code I’m using to download the files to tmp:

import { axios } from “@pipedream/platform”;
import fs from “fs”;

export default defineComponent({
props: {
filename1: {
type: “string”,
label: “Filename for the first JSON”,
description: “Enter the filename for the first JSON file.”,
},
filename2: {
type: “string”,
label: “Filename for the second JSON”,
description: “Enter the filename for the second JSON file.”,
},
url1: {
type: “string”,
label: “URL for the first JSON”,
description: “Enter the URL to download the first JSON file from.”,
},
url2: {
type: “string”,
label: “URL for the second JSON”,
description: “Enter the URL to download the second JSON file from.”,
},
},
async run({ steps, $ }) {
// Download the first JSON file
const response1 = await axios($, {
url: this.url1,
method: “GET”,
});
fs.writeFileSync(/tmp/${**this**.filename1}, JSON.stringify(response1));

_// Download the second JSON file_
**const** response2 = **await** axios($, {
  url: **this**.url2,
  method: "GET",
});
fs.writeFileSync(`/tmp/${**this**.filename2}`, JSON.stringify(response2));

_// Access the specified paths in the workflow_
**const** lyrics = steps.Moisis_Pause_For_Processing_Compeptition.jobDetails.result.lyrics;
**const** beatMap = steps.Moisis_Pause_For_Processing_Compeptition.jobDetails.result.beatMap;

_// Assuming you want to do something with these paths, you can process them here_
_// For example, you could log them or include them in the returned message_
console.log("Lyrics:", lyrics);
console.log("BeatMap:", beatMap);

**return** {
  message: `Files saved as /tmp/${**this**.filename1} and /tmp/${**this**.filename2}`,
  lyrics,
  beatMap,
};

},
});