Bonsai - Dev Overview
December 28, 2024
Bonsai is a platform for recording artists to elevate the fan experience and turn a moment into something worth showing off. It facilitates the connection between an artist and their fans across multiple platforms, giving artists better insight into their fanbase and allowing them to communicate with their audience in a creative fashion.
One of the core features of Bonsai are their audiograms; superfans can ask their favorite artists anything by engaging with their content and communities (YouTube, Discord, etc.) and artists can reply by making short audio recordings which, when paired with our tech, generates bonsais, a short audio visualizer which can easily be shared across social platforms.
I was initially approached to upgrade the existing audio visualizer.
Approach
The existing version of the visualizer was created using HTML canvas, through a vanilla yet complex connection of line points, connected and manipulated based on audio data to give the desired effect. My goal was to successfully tackle three priorities:
- Simplify the visualizer implementation
- Bring the final implementation closer to the original designs
- Make the visualizer more responsive in relation to the incoming audio data
As this was all being done client-side, my approach was to use Three.js. The reason behind the decision was that Three.js would allow us to maintain some familiarity in the new approach by continuing to use the HTML canvas while providing pre-built geometries which could be more easily configured and further extended through shaders. This flexibility was important, as Bonsai already had three different visualizer presets, and I wanted to ensure that there was already a structure in place for creating future presets.
As the code is proprietary, I am unable to share code snippets or go into depth into the implementation however, there are a couple of points I can touch upon.
Web Audio API
To handle audio recording and frequency data I used the Web Audio API, which was already in use in the previous implementation. Most of the logic there remained the same, however I used my musical background 😎 to add some additional flare to the data. The main change in this regard was the implementation of a Savitzky-Golay filter in order to smooth out the frequency data (this was particularly useful for the waves preset which you can see further below), giving our adjacent preset geometries a more uniform look.
Three.js
For those already familiar with Three.js, the implementation was very straightforward. All the available presets were created using a Line2 geometry (so that we had line thickness support). The core part of the work effort that went into this step was in
- ensuring that frequency data was properly distributed across geometries
- that we only generated enough geometries to fit the visible screen while also adapting to the available audiogram ratios (1:1 and 9:16)
- that the audio visualizer was as optimized and performant as possible (done through simple techniques such as properly clearing all unused geometries to prevent dead references from remaining in memory, reducing GPU usage).
The end result for the three presets can be seen below:
Going beyond
Upon completing the audio visualizer 2.0, we discussed as a team how we could take the audiograms to the next level and create a better user experience. We decided to completely remove the visualizer from the client side, leaving only the recording capabilities on the frontend. As for the visuals, our goal was to come up with a serverless solution that would take in the recorded audio, generate the audiogram on the cloud, and return the final product to the user as an mp4. The goal with this was to decouple our tool from the constraints that come from the user's GPU and browser. If everything worked as we expected, our solution would be much more stable, simplify the entire audiogram generation process, and allow us to have better control over frame rate.
Once again, as solutions are proprietary, I can only provide a brief overview of the approach to the audio visualizer 3.0.
To develop our solution, we opted to use Rust, which is an incredibly fast and memory-efficient low-level programming language. As we were no longer working with browsers and Three.js is built on top of WebGL, it was no longer a viable solution. Instead, we had to look at tools closer to native OpenGL, and ultimately decided to do all our visual development using wgpu. Once we had these tools established, the development itself mainly centered around migrating the logic of the existing implementation while adapting to the nuances of the Rust language.
The most challenging part up to this point (the whole project is still a work in progress as of writing this post), was defining the appropriate architecture for the micro-service that we were developing. After multiple iterations, we landed upon the following stack:
- Docker - to containerize our audiogram generator
- LambdaLabs - to host our Docker container while providing the required GPU resources
- AWS SNS - For inter-process communication and triggering the appropriate resources
- AWS S3 - For storing all generated audiograms and accessing the pertinent data
As a result of our chosen architecture and coding language, we ended up with the following audiogram (mp4 snippet converted to gif).
Results
So, what were the outcomes of audio visualizer 3.0?
First and foremost, we successfully decoupled our tool from the user's browser and GPU. This means that the audiogram generation has been standardized; users with slow computers no longer need to worry about a slow user experience as all intensive processes are executed outside their environment.
Second, we've simplified multiple internal processes, particularly those pertaining to how audiograms were being generated previously through our AWS tooling. All of this potentially reduces usage costs and makes our integrations easier to maintain.
Finally, we have much better control over frame rate. This might not make sense just looking at the final output however, due to the way that we've built this visualizer, we can actually define the frame rate that we would like for our generated mp4. We've personally made the choice to stick with 60fps as it works more than perfectly for all intents and purposes, but having the option to increase/decrease this value without having to worry about frame drops is a fantastic achievement.