Back to Blog

What is Supervised Learning?

28
May
2024
Technology
Defining Supervised Learning (SL)

Machine Learning (ML) is a field of study that has defied what we thought was possible. ML algorithms can help predict health risks, forecast the weather, identify user preferences, and the potential success of movies and books. They are the power behind the creation of tools like ChatGPT, Midjourney, and Sora. Machine Learning algorithms have given systems human-like capabilities. Thanks to them, machines can write, hear, and speak. But how can they do that? Why are they so effective? How can a machine learn to think and respond the way we do it? Data Scientists use training algorithm techniques to do so. One of the most relevant and popular is Supervised Learning. Let’s dive into what it is and why it is so important!

What is Supervised Learning?

Supervised Learning is a Machine Learning technique that allows computers to learn from labeled data. In this manner, machines can process information and make decisions by themselves. They don’t need explicit instructions. Labeled datasets help guarantee great accuracy in Machine Learning models. 

They help the model learn by telling it what’s right and wrong multiple times. That makes the learning process very effective. Data, as you may well know, can involve a wide range of factors. Common examples include age, gender, color, text, measurements, shape, etc. By labeling these variables, machines can learn to recognize patterns and make classifications and accurate predictions. 

Supervised Learning methods require people to participate actively in the training process. That’s why it’s called "supervised." Also, Software Engineers must update and refine the data to guarantee accuracy over time. That makes Supervised Learning time-consuming but very effective. Machine Learning Engineering teams use programming languages like Python and R to perform Supervised Learning tasks. Also, they use libraries like NumPy, Pandas, Scikit-learn, and Tensorflow to be more efficient. Some of the most popular Supervised Learning algorithms include Decision Trees and Neural Networks (NN). Polynomial Regression, Random Forest, and K-Closest Neighbors (KNN) are also very popular. 

Why is Supervised Learning Important?

Supervised Learning is one of the main reasons why Artificial Intelligence (AI) has become so popular and useful these days. There is a wide range of applications for individuals and businesses, helping make our lives much simpler. Supervised Learning helps us interact with digital devices, making technology more accessible. Tools such as Amazon’s Alexa, Google Home, or Apple’s Siri use it for speech recognition. 

Also, Supervised Learning helps refine the Natural Language Understanding capabilities of these kinds of tools. It can also help with medical diagnosis based on past medical records. This way, doctors can act quicker and potentially save more lives. Supervised Learning can also be a powerful tool for anomaly detection, money laundry, and fraud detection. 

Supervised Learning also helps businesses analyze user data to offer more personalized experiences. Companies like Netflix, Booking, Amazon, and Spotify do this to offer tailored recommendations. In this manner, Supervised Learning helps them provide more value, which can lead to more growth. At the same time, users benefit by receiving products that are more tailored to their needs. Supervised Learning can also help with things we all do on a regular basis. Examples include email filtering and traffic predictions with Google Maps.  

Main Types of Supervised Learning

Classification in Supervised Learning

In Classification models, the expected output variable is a discrete class label. In other words, a class or a category. These can be numerical or categorical features (or even both). In the training phase, the model receives labeled data to learn to recognize specific patterns. The goal is that it learns to select the right label or class when it processes unseen data. Classification is one of the most common and useful Machine Learning methods. It has tons of use cases. 

Common applications for Machine Learning classification models include spam detection, image classification, and disease diagnosis.

Example Using Python and Scikit-Learn

The following example uses an Iris dataset introduced by the British statistician and biologist Ronald Fisher. The script starts by loading the dataset and splits it into training and testing sets. Then, it uses the training data on the popular K-Nearest Neighbors (KNN) algorithm. Finally, it makes predictions with the testing data and determines the model’s accuracy. 

Keep in mind that in real-life scenarios, training examples and datasets need much more preprocessing. Also, if you want to run the script, you’ll need to have Python and Scikit-learn installed. You can quickly install Scikit-learn by running pip install scikit-learn in your terminal. 

# Import necessary Scikit-learn libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics

# Load iris dataset
iris = load_iris()

# Create feature and target arrays
X = iris.data
y = iris.target
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the model
knn = KNeighborsClassifier(n_neighbors=3)

# Fit the model to the training data
knn.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn.predict(X_test)

# Calculate Accuracy
accuracy = metrics.accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')

Regression In Supervised Learning

Unlike Classification, Regression algorithms target variables with continuous values or labels, not discrete ones. Continuous labels are always numerical values. Think of stock prices, sales, salaries, revenue, etc. The goal of Regression is to minimize the gap between the predicted values and the real values. The model must learn to recognize patterns between independent and dependent variables. There are also tons of use cases for Regression tasks. For example, predicting business growth (dependent variable) based on sales (independent variable). 

It’s worth noting that there are two main types of Regression. The one we’ve just discussed is known as Linear Regression. The other type is called Logistic Regression, which targets discrete variables. Despite its name, Logistic Regression handles classification tasks. It's often used in binary classification problems when the expected output is "true" or "false." Its main focus is on Probability. In other words, the odds of a certain event being true or false. An example would be to predict whether it will rain on a certain day or not. 

Example Using Python, Scikit-Learn, and Numpy

In the following example, we used the popular Boston Housing dataset. Its data is derived from information collected by the U.S. Census Service regarding housing in the area of Boston, Massachusetts. Similar to the previous Classification example, the script starts by loading the dataset and splitting it into training and testing sets. 

Then, it trains the Linear Regression model with the training data. Finally, it makes predictions based on the testing data and calculates the model’s Mean Squared Error (MSE). The MSE indicates how well the model can perform prediction tasks. The lower the MSE value is, the more accurate the results are. 

# Import necessary libraries
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import numpy as np

# Load Boston Housing dataset
boston = load_boston()

# Create feature and target arrays
X = boston.data
y = boston.target

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the model
lr = LinearRegression()

# Fit the model to the training data
lr.fit(X_train, y_train)

# Make predictions on the test data
y_pred = lr.predict(X_test)

# Calculate Mean Squared Error
mse = metrics.mean_squared_error(y_test, y_pred)

print(f'Mean Squared Error: {mse}')

How Does Supervised Learning Work?

Supervised Learning algorithms are trained with labeled datasets. They include input features with correct outputs. That’s why we mentioned the training process is like telling the model the correct answer. The training phase also involves correcting the algorithm when predictions are wrong. This way, it can improve its knowledge and accuracy over time. The training datasets must include pairs of inputs and outputs where the results are known. Supervised Learning algorithms work as mapping functions that map an input and an output variable. 

The process starts by getting relevant data. Then, Machine Learning Engineers must preprocess the data so that it is in a suitable format for the model. Then, during the training process, they can adjust the model’s parameters to improve accuracy. In simple terms, parameters are like ingredient amounts for a recipe you’re trying to make with Supervised Learning algorithms. They allow Machine Learning Engineers to make small tweaks to improve results. 

Before using the model to make predictions, Machine Learning Engineers must test the model's performance with unlabeled datasets. Then, they can adjust parameters based on it. As mentioned, this is a continuous process. Real-world data, especially numerical data, changes all the time. That’s why we mentioned it’s crucial to update and refine the model on a regular basis. 

Conclusion

Supervised Learning is one of the most important types of Machine Learning. It provides great accuracy and performance for AI models and apps. Since it allows Engineers to update and retrain complex models, it's likely to remain relevant over time. It's also worth noting that Supervised Learning laid the foundation for some other advanced AI methods.