How I learned to stop worrying and love Machine Learning

AI is actively changing the usual patterns of how users interact with digital products. For designers who want to stay ahead in this dynamic field, it’s important to understand the basics of AI. But don’t be afraid — it’s more accessible than it may seem at first glance. I will prove it in this article.

Of course, my journey of learning the basics of AI was not as smooth as I would have liked. Most of the time, I had no idea what I was doing or how to fix the emerging errors. I overcame one frustration after another. One chocolate bar after another.

But first things first.

Mastering AI nowadays can be compared to mastering computer literacy at the end of the 20th century. Just as then, computer literacy was key to many professions and spheres of life, so now AI is becoming increasingly important and in demand. It is like basic hygiene for anyone involved in the digital industry.

Fear of AI overwhelm

I bet that every product designer has, at some point, felt the fear of falling behind due to a lack of knowledge of AI. Sounds familiar? After all, AI is a vast and rapidly evolving domain. And this isn’t just limited to the IT community; the phenomenon of AI anxiety is widespread.

But I’m here to reassure you! Designers only need to focus on one area of AI — Machine Learning, or more specifically, Deep Learning.

Deep Learning is a subset of machine learning. It requires huge amounts of data and significant computational power, enabling it to excel in AI tasks where traditional ML might struggle.

Why focus on Deep Learning?

Deep Learning is the main technology used in digital products today.

Previously, AI areas such as natural language processing (NLP), computer vision, and audio processing were developed separately, but now Deep Learning is bringing them together by providing a common framework.

a diagram of a yellow circle with black text
What can Deep Learning do presently? Illustration by Maria Kovalevich

All modern achievements in AI, such as GPT, AlphaFold, and DALLE, are based on Deep Learning. Understanding this foundation will provide a clearer assessment of AI’s capabilities.

Deep Learning tools are becoming more accessible, allowing even those far from programming to experiment and apply them.

Why is it essential for product designers?

Not so long ago, designers celebrated their triumph as business owners finally recognized the importance of user feedback. Companies began to invest more in user research and take its results into account in product development.

It was a long and difficult journey, wasn’t it?

However, as the focus in the startup race rapidly shifts to integrating artificial intelligence, designers are once again on the losing end. Often, AI solutions can conflict with real user needs, challenging the designers’ focus on the user.

a couple of faces with a crown and letters
Illustration by Maria Kovalevich

By understanding the fundamentals of Deep Learning, designers will be able to hold the initiative. They have a special ability to not only improve the user experience but also add significant value to products. It is through their skills that innovative solutions emerge that both fulfill users’ needs and make them fall in love with the product.

Recent examples of innovations using Deep Learning technologies

Just in the past few weeks, various companies and startups have announced several innovations that are, in their own way, minor breakthroughs. Here are three examples:

Spotify — Voice Translation. The company introduced the automatic translation of podcasts into different languages, which increased the accessibility of the product and greatly expanded its potential audience.

Arc Browser — Arc Max. Arc Max uses AI to rename pinned tabs and downloaded files for clarity and provides link summaries when hovered over with the shift key pressed.

So now Arc Max provides users with a more intuitive, context-sensitive, and customizable web surfing experience that can be called the benchmark of affordance for a web browser.

Sendforsign—AI signature generation. Automatic signature generation for contracts alleviates the primary headache associated with contract management.

I trust I’ve highlighted the critical role that understanding Deep Learning fundamentals plays for product designers.

Are you ready to dive into a step-by-step guide to learn the essentials?

Then read on for my step-by-step guide on how to do this, even if you’re just a mere mortal product designer.

The following story is based on my personal experience. I am far from programming. I have only tried to learn the basics of SwiftUI, which I am still working on. So please don’t judge me too harshly for my terminology.

Step 1: Finding the right place to learn

As you might guess, I’m one of those designers who experienced ‘Fear of AI overwhelm’, a feeling I’ve mentioned earlier.

In my search for a solution to this problem, I repeatedly came across Fast.ai course Practical Deep Learning for Coders. Despite the word ‘coders’ in its title, some people claim that prior coding knowledge isn’t necessary.

So I decided to heed the signs of fate and try the essence of experiential learning (it’s when you learn about a subject through practice). Especially, the course is completely free of charge.

While I started with modest expectations, by the end of the first lesson, I was eager to build and train my first model.

The hallmark of the course is the rapid achievement of results, effectively removing the perceived knowledge barrier and — I hope — awakening a passion for Deep Learning.

Next, I describe all my steps according to the knowledge I have gained after going through the first lesson. Where you are given ready code and access to a pre-trained model that you can feed any data to.

Step 2. Technical setup

To work on the model, I used the following resources:

Kaggle — an online platform for data science and machine learning.

In a nutshell, Kaggle provides you with free access to a huge database of datasets and computational resources like Central Processing Units (CPU). This means you can write code, analyze data, and build models without the need to install specialized software or use powerful machines.

Kaggle Notebooks — based on Jupyter Notebook, provide an interactive environment where users can write, execute code, and then share their analyses and results with the community, all within the Kaggle platform.

GPT-4 — needs no introduction 😃

Step 3. Identify a dataset to train the model on

In the first lesson, you are offered to use an image recognition model.

But before we get into the details, I want to briefly talk about the components of a dataset for Deep Learning models. A typical dataset consists of two parts*:

  1. Training Data (80%): The main chunk of data where the model learns by adjusting to the information.
  2. Validation Data (20%): After certain learning cycles, the model tests itself on this data. It helps to see how well the model recognizes new data and prevents it from just memorizing.
a pie chart with a diagram of a person’s arm and dumbbell
Illustration by Maria Kovalevich

*I have taken the data ratio from the model given in lesson#1 of the course. However, the actual ratio may vary, depending on your tasks and possibilities.

After you trained and validated your model, you need Test Data. After training on this data, the final performance of the model is tested, showing how it can perform on brand-new data in real-world conditions.

So, let’s return to identifying a dataset for training the model.

For simplicity, the lecturer Jeremy Howard suggests teaching the model to recognize only two objects (this can be called binary classification).

Images of birds and forests are used as examples.

However, since I am passionate about the topic of Affective Computing and have some knowledge in this field, I decided to go a step further and build a model for emotion recognition.

This, of course, made the task a bit more complex.

The most accessible option for me was to recognize emotions through voice. I knew that a sound recording could be converted into an image called a spectrogram.

Protip: Almost any data can be converted into an image representation. Even viruses on your computer. Learn more about this in Jeremy Howard’s book “Deep Learning for Coders with fastai & PyTorch”.

The only thing left was to find a couple of dozen recordings to train the model.

Step 4. Finding the voices with true emotions

Where could I find voices imbued with various emotions?

My beloved husband came up with a brilliant suggestion: to record actors trained in Stanislavski’s system. But the thought of spending several evenings watching movies made me frantically search for alternatives.

(Plus, come on, actors? How can one trust such delicate matters as genuine emotions to actors?)

I decided to use my practice as a designer — reference searching — here too. Surely someone has already tried to create an emotion recognition model and used some data for it.

That’s when I discovered the wonderful world of free scientific resources. From open studies on ResearchGate to free datasets and libraries on Kaggle.

It turns out, there were a handful of scientists in academic circles who shared my husband’s brilliant idea and recorded several datasets using actors’ voices, well-known in scientific communities.

(Ah, so actors do portray emotions quite convincingly after all… Who would have thought!)

So, for my research, I chose the Toronto emotional speech set (TESS) dataset, which exclusively features female voices. I made this choice to maintain the purity of the experiment, given the significant distinction between male and female timbres.

Step 5. Converting sound to spectrograms

OK, I’ve found the dataset, but what do I do next? Manually convert each record into a spectrogram? That’s days of work again.

But once again, I was lucky enough to find the magical Speech Emotion Recognition project on Kaggle, where I not only discovered a library for batch converting all recordings into spectrograms but also gained insights on how to better structure my project.

It seemed to me that it would be better to divide the project into two parts, or in Kaggle terms, into two notebooks. One notebook — dataset prep. Second — model training.

Next, for writing the code, I turned to the GPT-4 chat for help. The first prompt looked like this (this prompt is a combination of all my previous attempts, and there were quite a few of them):

GPT-4 Prompt: “I’m using Jupyter Notebook on Kaggle. The task is to convert spectrograms from the TESS dataset that contain only sad and neutral voices and save them in separate folders within the current directory or notebook workspace. Please extract all the necessary information from this notebook [here I pasted the text of the speech emotion recognition notebook] to complete this task. Provide me with the code, broken down step by step and formatted in the style of a Kaggle notebook. Note that I’m creating a notebook from scratch and haven’t installed any libraries or datasets yet.”

Result:

screenshot from chat gpt-4
Screenshot from ChatGPT-4

Next, I simply pasted the code into my notebook. The headers and comments for the steps were in ‘Markdown’ cells, and the code, obviously, was in ‘code’ cells.

Just to set expectations, the way GPT-4 responds to my prompts might vary slightly for you due to its stochastic (random) approach to text generation. This means that even with the same input, the model might produce slightly different answers. I’ll offer a basic framework to demonstrate the typical sequence of steps. Be mentally prepared that you might need to refine your requests and polish the result on your own. Sometimes it took me a couple of hours, but it’s worth it, believe me!

Example of a neutral voice: audio + spectrogram

Example of a neutral voice from TESS dataset
spectrogram
Spectrogram of a neutral voice

And a sad one:

Example of a sad voice from TESS dataset
spectrogram
Spectrogram of a sad voice

When I saw the first results, I had something like a research itch. And it was hard to stop.

So, what’s next?

I have the training and validation data. Now I need to find a recording to test the model.

I wanted to find a genuinely emotional voice, preferably one whose emotionality is officially confirmed. And remember, we need a female voice!

What associations do you have with the combination of ‘undeniable sadness’ + ‘female voice’? For me, it’s Meryl Streep! A three-time Oscar winner.

For the test, I decided to choose the famous ‘sad’ monologue of Meryl Streep from the film “Sophie’s Choice”.

A fragment of Meryl Streep’s monologue from the movie “Sophie’s Choice”.
Spectrogram of a fragment of Meryl’s monologue.

Here’s the link to my notebook with all the data for my dataset. In case you want to repeat my experiment.

Step 6: Training the model

The most challenging and crucial part!

To ensure I didn’t mistakenly modify the original code, I used the original file from lesson #1. Using the ‘Edit my copy’ option, I created my own copy.

Instead of using giving pictures of birds and forests, I used my dataset of spectrograms. Which I downloaded into a copy of the notebook. (Of course, I asked the GPT chat how to specifically upload my files to the notebook to use them in the code).

Obviously, I couldn’t get the model running on the first try. To fix the code, I not only consulted GPT-4 but also searched for solutions on Google and specialized forums. Yet, during my most challenging moments, I had to ask for help from a programmer friend. But in the end, I finally figured out what the point of variables in code is! Yay!

After a few agonizing hours, the code worked! And the model started the training process without those horrible big red error messages.

The process of training my first model on the Kaggle platform.

And that’s when the absolute shock finally hit me. “Oh my God! I’ve created my own Deep Learning model, and I don’t even realize how I did it!

Step 7: Testing the model ✨

Now it was time to test a fragment of Meryl Streep’s monologue.

I held my breath. After all, at this moment, the model was to deliver a harsh judgment on the acting abilities of the famous actress. (Does she work on Stanislavski’s system or not?).

So, what do you think? The model determined the level of sadness in the voice as 99 percent! 🎉

Screenshot from Kaggle Notebook with DL-model test results

Hooray! Meryl Streep lived up to her Oscar-winning status!

Illustration by Maria Kovalevich

This experience changed my view of the industry a lot.

Initially, as a product designer, I viewed AI as a mysterious and overwhelming force, far removed from my domain of expertise.

However, after diving into the fundamentals and building my first Deep Learning model, my perspective shifted dramatically. What once seemed like a magical black box became a comprehensible tool.

With this newfound understanding, I began to see AI not just as a technological marvel, but as an instrument I could integrate into my design process, enhancing my work and bringing innovative solutions to the forefront.

And the last thing

I didn’t burden you with unnecessary boring details. But in case you have questions, I’m at your service and will be glad to assist you in your challenging journey.

Don’t give up, a well-deserved reward awaits you at the end.

And one more thing (really the last one, I promise!)

The most important lesson I learned from this project can be articulated as follows:

When you use ML, you operate from first principles. You don’t delve into the “why”; it just happens. Instead of explaining “why”, you simply provide the data: “this is a bird” and “this is a forest”. From there, it learns.