Monday, March 10, 2025

[MusicViz Project] Part 3 - Breakdown of FFMPEG "showcqt" Experiment

This is the third installment of my ongoing series of posts on one of my long-term projects to develop a new automated technique for visualising music.

To motivate today's discussion, here is the final video clip rendered from the experimental technique being discussed in this post: 

https://www.youtube.com/watch?v=LTMpzmdI9qQ

 

And here is the command used to render that:

ffmpeg -y -i 20230803-v02.flac -filter_complex "[0:a]showcqt=s=500x1920:axis=0:cscheme=0.6|0.7|0.1|0.1|0.8|0.5,crop=500:1392:0:4000,setsar=1,transpose=2[vcqt]; [0:a]showwaves=mode=cline:s=1920x100[vs]; [vcqt][vs]overlay=y=H-100[v]" -map "[v]" -map "0:a" -c:a aac "mv_20230803.mp4"

 

Let's annotate that to show the different parts more clearly (see the breakdown following this for a rough description of what each part does):

ffmpeg -y -i 20230803-v02.flac -filter_complex "[0:a]showcqt=s=500x1920:axis=0:cscheme=0.6|0.7|0.1|0.1|0.8|0.5,crop=500:1392:0:4000,setsar=1,transpose=2[vcqt]; [0:a]showwaves=mode=cline:s=1920x100[vs]; [vcqt][vs]overlay=y=H-100[v]" -map "[v]" -map "0:a" -c:a aac "mv_20230803.mp4"


A breakdown of this command-line into its constituent parts:

* 1) `ffmpeg` - This part is just the self-explanatory bit calling the FFMPEG binary to do the work

* 2) `-y` - As per the `ffmpeg --help`, this just tells FFMPEG that it has permission to overwrite the output file without nagging about that later

* 3) `-i  20230803-v02.flac` - This part tells FFMPEG the input audio-file to use for all this filter-magic

 ---

* 4)  `--filter_complex "..."` - This part uses FFMPEG's filtering-stacking DSL to define the filter graph to run to get the desired results

   ------------

** 4.1) `[0:a] showcqt=...[vcqt];` - This part runs the "showcqt" filter on the given audio stream ("0:a"), rendering the results into video-buffer "vcqt". This generates the main "dynamic note-scroller/plotter" part of the video. Further details of how this part is put together can be understood from the development steps later - A summary of what these bits do follows:

*** 4.1.1) "showcqt" filter options...

**** 4.1.1.1) `s=500x1920` - Video size for output. This is the pre-rotated size

**** 4.1.1.2) `axis=0` - Don't show the pitch names "axis" ladder

**** 4.1.1.3) `cscheme=0.6|0.7|0.1|0.1|0.8|0.5` - Spectrogram colour scheme, expressed as RGB tuple floats for left then right sides (i.e. "left_r|left_g|left_b|right_r|right_g|right_b")

   ........

*** 4.1.2) `crop=500:1392:0:4000,setsar=1,transpose=2` - Cropping + Rotating of showcqt-output

**** 4.1.2.1) `crop=500:1392:0:4000` - IIRC, this is set this way (from experimental testing) to shift the start point (i.e. where the blobs get emitted from / where the axis would have sat) to sit closer the the left of the frame

**** 4.1.2.2) `setsar=1` - This sets "pixel / scale aspect ratio" to be 1:1

**** 4.1.3.3) `transpose=2` - This does the H-V rotation/flipping

    ------------

** 4.2) `[0:a] showwaves=...[vs];` - This part runs the "showwaves"  filter on the given audio stream ("0:a"), rendering the results into video-buffer "vs".  This generates the secondary "frequency-waveform" part of the visualisation  (i.e. for each frequency, laid out across the horizontal axis, plot the intensity of that frequency at this frame)

** 4.3) `[vcqt][vs]overlay=y=H-100[v]` - This part takes video streams "vcqt" and "vs", and runs the "overlay" filter. It results in the second input ("vs") being overlaid over the first ("vcqt") - this overlay gets placed so that it's "y" value is at "H-100" (i.e. "H = height of ycqt" - 100 (i.e. height of vs - as per the y-coordinate for its "s" parameter)

 ---

* 5)  I can't remember the exact meaning of these steps now, but they're either used for synchronising video + audio together, OR identifying which output values should end up as the final result

** 5.1) `-map "[v]"` - IIRC, this is something about mapping video-output "[v]" to be the final video output from the pipeline (i.e. the one that's written to the output file)

** 5.2) `-map "0:a"` - I'm guessing this is the analogous "audio" part one, which seems to say something like use "audio-input 'a', assigned to audio-track 0"   as the audio output from the pipeline

 ---

* 6) "-c:a aac"  - This says something like:  "copy audio from input to container, using 'AAC' codec"

* 7) "mv_20230803.mp4" - This is the bit which tells FFMPEG what the (final) output filename is.

 

First, a disclaimer on what's taken so long to get a new episode out:

I'd originally been planning on getting this post out a lot earlier, but ended up having to put most of my projects on hold after getting covid last year (and then having to prioritise getting adequate rest while recovering from that), followed later by a few hectic months at the end of last year (i.e. several big trips, followed by a protracted + messy + really-stressful renovation project, culminating in another bout of sickness). And now... I'm back to writing this on the tail end of yet another bout of sickness! (Ugh, this is starting to get really old!)

And yeah, between all that and err... me procrastinating on going tackling the slightly daunting process of going back through my notes, trying to break down the evolutionary steps, along with capturing screenshots / uploading sample videos to illustrate how this worked (i.e. a whole lot of work, that also requires a bit more disk-space than I was starting to leftover on my workstation at the time, and subsequently more time to collate + document everything)... yeah, it doesn't exactly inspire motivation!

~~~

Anyway, I finally decided to try to make some progress on this again. It turns out that I did have a whole bunch of notes on some of the steps, just maybe without accompanying videos for all of them. (That said, for the later steps, these notes are not as thorough, and become a bit of a mad-dash hacky scramble to get it working, and are all out of order)

So, in a change of tack, what I'll likely do is just make an initial dump of these notes, as-is, trying to get them in order as much as possible + with whatever commentary I can as to the motivation for each step, and then maybe someday I can come back and attach the videos / screenshots on to these. (I can see that I have some of these available, but others will need regenerating and/or to provide a bit more context - particularly for stuff like the color tweaking, which to this day I don't quite understand / cannot quite control as well as I need for actually using this stuff)

 

[20250309] So yeah... the following section may continue to be in a state of quasi-permanent "WIP"-ness for the foreseeable future... but hopefully one day it will be looking more complete.


1) Initial Steps - Investigating Available Options

This whole bunch of work came about because I was trying to check all the existing tools to check if there was something already close to what I want. (Answer: Not quite... at least not in anything exposed by this round of work at least... and not post this has kindof been blocking any further work going on)


# abitscope - https://hhsprings.bitbucket.io/docs/programming/examples/ffmpeg/audio_visualization/_abitscope_.html
--------------------------------------------------------------------------------------------------------------

ffmpeg -y -i 20230803-v02.flac -filter_complex "[0:a]abitscope=s=1920x1080[v]" -map "[v]" -map "0:a" -c:a mp3 "abitscope_demo.mp4"



# showspectrum rainbow - https://hhsprings.bitbucket.io/docs/programming/examples/ffmpeg/audio_visualization/_showspectrum_color_rainbow_legend_1_.html
--------------------------------------------------------------------------------------------------------------

ffmpeg -y -i 20230803-v02.flac -filter_complex "[0:a]showspectrum=s=1920x1080:color=rainbow:legend=1[v]" -map "[v]" -map "0:a" -c:a mp3 "spectrum_rainbow_demo.mp4"
Ugh yeah... this basically just plots out a giant rainbow spectrogram, from left to right, and looks like a really ugly science-y debug + analysis tool


# showcqt - https://hhsprings.bitbucket.io/docs/programming/examples/ffmpeg/audio_visualization/_showcqt_.html
--------------------------------------------------------------------------------------------------------------

ffmpeg -y -i 20230803-v02.flac -filter_complex "[0:a]showcqt=s=1920x1080[v]" -map "[v]" -map "0:a" -c:a mp3 "showcqt_demo.mp4"
 

Ooh... this looks more promising! (i.e. Can possibly adapt this for the long horizontal clouds + little "pips" in the sky).  Thus, it does a whole lot of things we want, but it looks really crude (see notes later), so that will need tweaking!


# showcqt (crop between A0(21) and C8(108)) - i.e. the standard range for most playable/useful pitches
# https://hhsprings.bitbucket.io/docs/programming/examples/ffmpeg/audio_visualization/_showcqt_crop_between_a0_21_and_c8_108_.html
---------------------------------------------------------------------------------------------------------------

ffmpeg -y -i 20230803-v02.flac -filter_complex "[0:a]showcqt=s=1920x1080,crop=1392:1080:88:0,setsar=1[v]" -map "[v]" -map "0:a" -c:a mp3 "showcqt_cropped_demo.mp4"


Clamping the range like this means that more space is dedicated to the "signal" (i.e. the playable notes range for most instruments... i.e. the thing we're actually interested in), instead of wasting this on overtones at the extents of what humans can hear.


2) Starting to Customize "showcqt" Visualisation

The first order of business here was to try and start making this "showcqt" output look a bit more sophisticated, and more like what we are after (i.e. something more akin to a horizontal-scrolling "Chinese Scroll Painting" of long held pitch lines, with smaller "dibs" for shorter plink-plonks... RATHER than the default, which looks like a bad + ugly "programmer art" example crude prototype of one of the "MIDI Keyboard Falling-Icicle Speed-Run" videos often found on certain corners of YouTube)


# First attempts at rotating showcqt
------------------------------------

(NOTE: This one is just experimenting with the effects on another piece of music I've not yet released online yet. While interesting to see, I don't want to release the first release of this online to be with an inappropriate / incomplete visualisation, hence this won't be posted for now)


ffmpeg -y -i 20200301-01-Radar.flac -filter_complex "[0:a]showcqt=s=1080x1920,crop=1080:1392:0:4000,setsar=1,transpose=2[v]" -map "[v]" -map "0:a" -c:a mp3 "showcqt_radar_demo.mp4"


Repeating the same thing on the piece we've been working with all along now...


ffmpeg -y -i 20230803-v02.flac -filter_complex "[0:a]showcqt=s=1080x1920,crop=1080:1392:0:4000,setsar=1,transpose=2[v]" -map "[v]" -map "0:a" -c:a mp3 "showcqt_rotated.mp4" 


Key Points Here:

* Needed to rotate/transpose so that instead of pitches going horizontal + pitches being plotted falling down, we instead have pitches vertical (higher = higher up) and time in the long horizontal axis, with notes being added at one end, and "scrolling along" the frame once they've been played

* A lot of the scaling and cropping stuff then becomes trying to simulatenously:

   1) Reduce the "dead space" that occurs "above" the line (when in the default vertical-dropping orientation, where that dead space is usually reserved for an image of a keyboard OR the score or something similar)

   2) Try to pull that start line towards the left side of the frame (instead of only starting somewhere between 1/3rd - 1/2th of the way along), maximising the time the plots spend scrolling along the frame, rendering nice long lines 

* Also, at some point, we needed to turn off that ugly pitch-names label line, as it was distracting + looked nasty (i.e. by adding  the "axis=0" part)

 

# Trying out different color schemes

Unfortunately, I don't have detailed notes about what I was trying to do with each step here anymore, but as you can infer from the following examples while consulting the docs, it's something like: 

Balance the "RGB" (0.0 - 1.0) tuples so that the "left" and "right" channel colours end up giving the resulting waveform the colour-cast we're trying to get.  (NOTE: Getting this right however turns out to be a very tricky and un-intuitive game of guesswork, where getting it wrong just sees everything fade into the overly bright "white" default glowly colours)

 

Some of the sample colour-schemes still found my scrap notes:

@REM  ffmpeg -y -i 20230803-v02.flac -filter_complex "[0:a]showcqt=s=1080x1920:axis=0:cscheme=0.2|0.8|0.1|0.1|0.8|0.5,crop=1080:1392:0:4000,setsar=1,transpose=2[v]" -map "[v]" -map "0:a" -c:a mp3 "mv_20230803.mp4"

@REM ffmpeg -y -i 20230803-v02.flac -filter_complex "[0:a]showcqt=s=1080x1920:axis=0:cscheme=0.6|0.7|0.1|0.1|0.8|0.5,crop=1080:1392:0:4000,setsar=1,transpose=2[v]" -map "[v]" -map "0:a" -c:a mp3 "mv_20230803.mp4"

@REM ffmpeg -y -i 20230803-v02.flac -filter_complex "[0:a]showcqt=s=500x1920:axis=0:cscheme=0.6|0.7|0.1|0.1|0.8|0.5,crop=500:1392:0:4000,setsar=1,transpose=2[v]" -map "[v]" -map "0:a" -c:a mp3 "mv_20230803.mp4"

 3) Stacking Pitch-Scroll and Frequency Waveform

While I do remember there was an explicit reason why I originally added the waveform, I can't remember what that exact reason is now.

I suspect it was something like:  "We need some more dynamic action", or something about trying to also get a sense of the intensities of the various frequencies involved (since that info is getting lost as we try to avoid everything blowing out to white, which subsequently constrains our ability to show by adjusting colours)

Nevertheless, it ended up being a dance of parameter adjusting to scale the waveform to become a narrow sliver that could be overlaid across the bottom that way


NOTE: The waveform was developed in isolation first, and then combined + moved into place (through careful scaling + sizing)

 

Next / Future Steps

So what's next?

Well, following further experiments (trying to visualise other pieces), it quickly became obvious that the main bottleneck was that the "showcqt" filter itself was not quite doing exactly everything needed / required, and/or not in the ways I really wanted it to, with a bunch of behaviours (especially that blown-out "white" color bend thing) essentially requiring the source code to be modified in order to modify them.

And so began the next tranche of work which I haven't managed to make much progress on so far:  i.e. Trying to climb the mountain of whatever specialised setup the FFMPEG devs have decreed that you need to jump through to get their source compiling...

As a long-lived Linux-first C/C++ project, this almost always involves some combination of make, posix-utilities / environment expectations, path hacks, etc. (and often a dose of some weird old autoconf / etc. thing, though CMake isn't that much better either if your local setup sucks for some reason)

Sigh.... as much as I do love coding in C/C++ (as I have done for most of my paid work), what I do NOT love is trying to get random projects written in these languages compiling on a random machine. Especially when said host machine runs Windows. (While that may be potentially changing this year with the forced Win 10 -> Win 11 switcheroo, we're not quite there yet, not to mention that making such a change does come with a whole bunch of major potential issues... including losing reliable access to all my current backups / randomly corrupting any of those disks that come within striking distance of Linux machines, having been burned badly on that front quite a few times in the past!)

So yeah... that's where we're at now. Stuck trying to set aside time to work through finding the best way of compiling the sources (and really, I only really want / need to compile that one file even). Reading the sources for that, it then becomes a thought that maybe I could just reimplement the stuff in another language to make it easier to use other graphics libs for all the later drawing, while still calling out to FFMPEG to do all the heavy lifting. That's the other option...

    EDIT: Checking my notes, this is where stuff like HTML/JS ports of this stuff start looking attractive, even though I've largely tried staying away from it in the past. In particular, promising things include libraries like https://github.com/mfcc64/showcqt-element

Short of a big-bang discovery out of left-field, the next post in this series will likely cover the custom rendering engine I've started building one way or another, which is where this whole thing was heading eventually one way or another!

No comments:

Post a Comment