- Published on
Simple Song Finder
- Authors
- Name
- Linell Bonnette
After over a decade of writing software professionally... sometimes things get boring. I'm thankful that my job at SBLive is actually pretty interesting, but at some level my day to day work follows a pretty consistent rhythm. Not long ago though, a coworker hit me up about a tiny project that sounded pretty fun.
I have a very random, non work related, question for you lol. I downloaded my Spotify listening history and I’m searching for a Chinese song that I found while I was living there. I have no idea what the title of the song is, but I know it’s in Mandarin. How could I take my data set and search for either Mandarin or non-English language titles? 😂
My initial response was that I bet ChatGPT could answer the question for him. I tried to just add the file into the assistants playground and ask it do the heavy lifting... but the results weren't great.
I poked a little bit more, but at the step where I was trying to figure out how to vectorize and search across it more easily, I realized that I was drastically overcomplicating things. A quick bit of Googling later, I ended up slapping together this quick little script:
import json
from langdetect import detect
def load_and_map_music_history(file_path):
with open(file_path, 'r') as file:
music_data = json.load(file)
unique_tracks = {}
for entry in music_data:
track_name = entry["master_metadata_track_name"]
artist_name = entry["master_metadata_album_artist_name"]
album_name = entry["master_metadata_album_album_name"]
connection_country = entry["conn_country"]
language = "unknown"
try:
track_language = detect(track_name)
artist_language = detect(artist_name)
album_language = detect(album_name)
except:
language = "unknown"
if track_name not in unique_tracks:
if ('zh' in track_language or 'zh' in artist_language or 'zh' in album_language):
unique_tracks[track_name] = {
"artist_name": artist_name,
"album_name": album_name
}
return unique_tracks
track_info_dict = load_and_map_music_history('music_history.json')
for track, info in track_info_dict.items():
print(f"{track} by {info['artist_name']} from the album {info['album_name']}")
I didn't try to optimize it at all, it's the quickest and dirtiest solution that I could make happen. And it even works! Here's the output:
Why Nobody Fights by Hua Chen Yu from the album 卡西莫多的禮物
二愣子 by Jordan Chan from the album 抱一抱
經典 (feat. 懂伯 & DJ Premier) by 大支 from the album 不聽
還是覺得你最好 b
It was a nice little reminder that sometimes it's okay to just hack something together and call it a day.
The Robots Probably Still Win
OpenAI has made some upgrades since I initially played around with this, and it's worth pointing out that it definitely can solve the problem without needing to lift a finger. I'm not sure whether my script or OpenAI is more correct.