Personal project to present interface to public domain CBS Radio Mystery Theater library. Problems:
CBSMRT.com…
; can I ethically scrape since it's public domain?https://archive.org/download/cbs_radio_mystery_theater/cbs_radio_mystery_theater-0701-0750.zip/cbs_radio_mystery_theater-0701-0750%2Fcbsrmt_0749_neatness_counts.mp3
You began this post actually to record how to trim MP3s on the command line. Ffmpeg can do it, and you can download a precompiled binary from ffmpeg.org. You can't snip audio out, but you can make a new audio file (without re-compressing) by taking copying just a portion of the source:
$ ./ffmpeg -i your-input.mp3 -vn -acodec copy -ss 00:00:00 -to 00:02:40 your-output.mp3
You could take all the portions you want to keep, then concatenate them with ffmpeg.
Also maybe: https://unix.stackexchange.com/questions/182602/trim-audio-file-using-start-and-stop-times
Also: https://askubuntu.com/questions/927308/how-to-crop-edit-mp3-files
Using <a href="…" download="filename.mp3">
sometimes works and sometimes does not with archive.org. Chrome and Firefox both state that it shouldn't work for cross-origin requests - perhaps sometimes archive.org puts in the "right" header and sometimes it does not.
A possible solution would be to download the data in the client as a blob, then save it. Idea from this MS polyfill and this SO about saving a blob:
https://github.com/jelmerdemaat/dwnld-attr-polyfill/blob/master/src/download-polyfill.js
https://stackoverflow.com/questions/25547475/save-to-local-file-from-blob
Note: ffmpeg
executable must be in same folder as swift script, ~/Projects/web/mystery-theater-browser/tools
. You downloaded precompiled version from here even though it's not compiled for ARM/Apple silicon. You had to manually open it from the Finder to get the "unverified; run anyway?" prompt before the script was allowed to run it.
S
key) to find boundaries of commercial breaks; it will play the area around the cursor as you move it. Use play cut preview (UI unknown but has default shortcut C
key) to play the area just before and after the selection. Unfortunately, switching between those modes clears the selection. Use Command-E to fit the view to the selection and Command-F to fit the view to the full length.tools
path of the mystery-theater-browser
project, e.g. ~/Projects/web/mystery-theater-browser/tools
./mp3-extract.swift <episode #> <in.mp3> <exported lables file> ...
/mp3-extract.swift 57 /Volumes/Time\ Machine/mystery-theater/cbsmrt-ken-long-collection/br/740608\ The\ Fall\ Of\ The\ House\ Of\ Usher\ WOR.mp3 /Volumes/Time\ Machine/mystery-theater/cbsmrt-ken-long-collection/br/740608\ The\ Fall\ Of\ The\ House\ Of\ Usher\ WOR.txt
episode #
is used for automatically renaming the resulting MP3.Volumes/Time Machine/myster-theater/nrl-no-ads
.mp3
extension to in the third column, NRL Filename. For the above, that would be the string 0057 740608 The Fall Of The House Of Usher WOR (no ads)
. ~/Projects/web/mystery-theater-browser-content-support
~/Projects/Experiments 2021/Mystery Radio Scraper
). It will parse the CSV files at the path exported to above and output the content files that the site can read.If you want to add an episode but don't need to create an ad-free version, you can follow the steps above starting with step 10, but don't put anything in the ad-free variant column.
What would be better:
A tool that could:
Discovered (from a review on archive.org) that two episodes from the BoA collection didn't download properly and weren't titled properly. Both contained $
. Assuming some issue with a Bash script (is that what I used?).
Incorrect filenames:
Correct filenames:
You installed whisper. General usage:
~/Projects/other-experiments/whisper $ whisper --model [tiny or small].en --language en [path.mp3]
Creates text files at [path.mp3.txt]
and [path.mps.vtt]
, where the .txt
is just the text, and the .vtt
is a human-readable captioning format with time stamps. The small
model is good, the tiny
model is fast.
Of interest for two reasons:
small
model is about 1:1 for time-to-transcribe, that is, it takes an hour to transcribe an hour of audio.tiny
model seems to take about 10 minutes for an episode. Then you can search for "be back" or "for act" to try to find commercial breaks and see their time stamps. This actually works well - much more pleasant and faster to scan text than to scrub audio.There's a python library. So close to, but not quite able to, just having it transcribe an episode, finding the ad breaks, and trimming them automatically. For example, time stamps don't take into account music; lines don't care about different voices (I'll be back for Act II shortly. Your local Buick dealer is offering…).