Nathan Lamont

Notes to Self

Mystery Theater

Personal project to present interface to public domain CBS Radio Mystery Theater library. Problems:

  1. Top hit is this sub-optimal site which is ad-laden and broadly anti-user. However, operator seems protective of his work (there's a crudely-named anti-ad-block script; all audio files names start with CBSMRT.com…; can I ethically scrape since it's public domain?
  2. This separate site is specifically user-friendly, although quite plain — it's meant for low-vision users. Great resource to compare against #1. Different episode descriptions. No audio files.
  3. Internet archive lets you download for example but the urls may not be predictable (ia800703.us.archive.org?). However - the files sizes are smaller, and ads have been removed, at least on your random samples. cbsmrt.com claims they have some superior recordings. OH! Internet archive's actual URLs are completely usable, assuming you can hot-link to them, e.g. https://archive.org/download/cbs_radio_mystery_theater/cbs_radio_mystery_theater-0701-0750.zip/cbs_radio_mystery_theater-0701-0750%2Fcbsrmt_0749_neatness_counts.mp3

You began this post actually to record how to trim MP3s on the command line. Ffmpeg can do it, and you can download a precompiled binary from ffmpeg.org. You can't snip audio out, but you can make a new audio file (without re-compressing) by taking copying just a portion of the source:

$ ./ffmpeg -i your-input.mp3 -vn -acodec copy -ss 00:00:00 -to 00:02:40 your-output.mp3

You could take all the portions you want to keep, then concatenate them with ffmpeg.

Also maybe: https://unix.stackexchange.com/questions/182602/trim-audio-file-using-start-and-stop-times

Also: https://askubuntu.com/questions/927308/how-to-crop-edit-mp3-files

2021-06-11

Using <a href="…" download="filename.mp3"> sometimes works and sometimes does not with archive.org. Chrome and Firefox both state that it shouldn't work for cross-origin requests - perhaps sometimes archive.org puts in the "right" header and sometimes it does not.

A possible solution would be to download the data in the client as a blob, then save it. Idea from this MS polyfill and this SO about saving a blob:

https://github.com/jelmerdemaat/dwnld-attr-polyfill/blob/master/src/download-polyfill.js

https://stackoverflow.com/questions/25547475/save-to-local-file-from-blob

Making Ad-free Version of an Episode

Note: ffmpeg executable must be in same folder as swift script, ~/Projects/web/mystery-theater-browser/tools. You downloaded precompiled version from here even though it's not compiled for ARM/Apple silicon. You had to manually open it from the Finder to get the "unverified; run anyway?" prompt before the script was allowed to run it.

  1. Open MP3 in Audacity
  2. In Audacity, use seek (Transport > Scrubbing > Seek; you assigned to S key) to find boundaries of commercial breaks; it will play the area around the cursor as you move it. Use play cut preview (UI unknown but has default shortcut C key) to play the area just before and after the selection. Unfortunately, switching between those modes clears the selection. Use Command-E to fit the view to the selection and Command-F to fit the view to the full length.
    1. "Acts" are about 12 minutes long
    2. Commercial breaks are 2-3 minutes long
  3. For each break you find, hit Command-B (or Edit > Labels > Add Label at Selection). You may but are not required to type in a label.
  4. When all sections to remove have been so labeled, select File > Export > Export Labels…
  5. Open a terminal window in the tools path of the mystery-theater-browser project, e.g. ~/Projects/web/mystery-theater-browser/tools
  6. Start typing a command in the terminal using the script to create the new (but not-re-encoded) MP3:
    ./mp3-extract.swift <episode #> <in.mp3> <exported lables file> ...
    

    eg:
    /mp3-extract.swift 57 /Volumes/Time\ Machine/mystery-theater/cbsmrt-ken-long-collection/br/740608\ The\ Fall\ Of\ The\ House\ Of\ Usher\ WOR.mp3 /Volumes/Time\ Machine/mystery-theater/cbsmrt-ken-long-collection/br/740608\ The\ Fall\ Of\ The\ House\ Of\ Usher\ WOR.txt
    
    1. This command is for specifying the chunks to exclude.
    2. The episode # is used for automatically renaming the resulting MP3.
  7. Once complete, hit return to execute the command. A new MP3 will be created in the same directory as the source MP3.
  8. Listen to the resulting recording to confirm no content has been lost. It should be around 43 to 47 minutes long.
  9. Copy the resulting recording to the appropriate directory on the "Time Machine" external drive: Volumes/Time Machine/myster-theater/nrl-no-ads
  10. Upload to your collection in archive.org by going to https://archive.org/details/cbsrmt-nrl-ad-free-collection, clicking edit then clicking I want to change the files - or try clicking this
  11. The "CBS Radio Mystery Theater" spreadsheet is the "source of truth" and should be updated:
    1. Specify that this file has a no-ads variant by opening the "CBS Radio Mystery Theater" spreadsheet, navigating to the "Ken Long URLs" "NRL Ad Free" sheet, and:
      1. Putting the episode number in the first column, Episode ID
      2. Putting the original URL to the version with ads in the second column, Original URL
      3. pasting in the file name without the .mp3 extension to in the third column, NRL Filename. For the above, that would be the string 0057 740608 The Fall Of The House Of Usher WOR (no ads). In the same row, add a 1 under the "preferred for episode" column (or a 2 if there are other usable recordings)
      4. Paste the contents of the exported label files in the fourth column Sections Excluded. This is in case at some point in the future an episode needs to be edited or a better method for splicing episodes becomes available.
    2. Add a description and tags to the episode in the "NRL Descriptions" sheet. Current tags are:
      1. "sci-fi"
      2. "whodunnit"
      3. "thriller"
      4. "paranormal"
      5. "crime"
      6. "drama"
      7. "mystery"
      8. "supernatural"
      9. "murder"
      10. "psychological"
    3. If the audio of the episode is of acceptable quality, put a 1 for "best audio quality." If it's listenable but has a significant issue (part of it is missing or hard to hear) give it a -1. If its quality is too low to be enjoyable, give it a -2.
    4. If the episode (plot, acting, writing, etc.) is especially enjoyable, give it a 1 under Recommended
  12. Export the spreadsheet as a CSV to ~/Projects/web/mystery-theater-browser-content-support
  13. Run the "Mystery Theater Scraper" project (located in ~/Projects/Experiments 2021/Mystery Radio Scraper). It will parse the CSV files at the path exported to above and output the content files that the site can read.

If you want to add an episode but don't need to create an ad-free version, you can follow the steps above starting with step 10, but don't put anything in the ad-free variant column.

What would be better:

A tool that could:

  • Preview all the available MP3s
  • Edit the CSVs directly
  • Allow you to open an MP3 and select segments to remove & export new MP3
  • Run the scraper process

Issue

Discovered (from a review on archive.org) that two episodes from the BoA collection didn't download properly and weren't titled properly. Both contained $. Assuming some issue with a Bash script (is that what I used?).

Incorrect filenames:

  • /CBSRMT-791114-1030-The-,000-Error-(128-44)_no-id-{BoA} 2.mp3
  • /CBSRMT-810415-1185-The-Fatal-,000-(128-44)_KQV-{BoA} 2.mp3

Correct filenames:

  • /CBSRMT 791114 1030 The $999,000 Error (128-44)_no id {BoA}.mp3
  • /CBSRMT 810415 1185 The Fatal $50,000 (128-44)_KQV {BoA}.m3

Transcribing

You installed whisper. General usage:

~/Projects/other-experiments/whisper $ whisper --model [tiny or small].en --language en [path.mp3]

Creates text files at [path.mp3.txt] and [path.mps.vtt], where the .txt is just the text, and the .vtt is a human-readable captioning format with time stamps. The small model is good, the tiny model is fast.

Of interest for two reasons:

  • Could be used to transcribe entire episodes fairly accurately for searching and presentation on site. But the small model is about 1:1 for time-to-transcribe, that is, it takes an hour to transcribe an hour of audio.
  • Could be used to more quickly find ads. The tiny model seems to take about 10 minutes for an episode. Then you can search for "be back" or "for act" to try to find commercial breaks and see their time stamps. This actually works well - much more pleasant and faster to scan text than to scrub audio.

There's a python library. So close to, but not quite able to, just having it transcribe an episode, finding the ad breaks, and trimming them automatically. For example, time stamps don't take into account music; lines don't care about different voices (I'll be back for Act II shortly. Your local Buick dealer is offering…).