Back to Blog

Mathematics for Machine Learning

Mathematics for Machine Learning

Fact-checked by Rafael Tabasca, Full Stack Developer @ Capicua.

Machine Learning can play chess better than Garry Kasparov vs. Magnus Carlsen. The practical applications of Machine Learning go from recognizing human faces to identifying suspicious patterns. Moreover, it can analyze preferences to give personalized recommendations. Unsurprisingly, knowing code isn't enough to create such powerful tools. As you may know, the field of Machine Learning entails complex math operations that depend on the model complexity of your program. This article will focus on the leading mathematical concepts of calculus that Machine Learning Engineers and Data Scientists use to build stunning and advanced applications. These apps include well-known names like ChatGPT and OpenArtAI. Are you ready to discuss the background knowledge you should have on mathematics for Machine Learning algorithm training?

Linear Functions in Machine Learning

Linear functions are fundamental in Machine Learning. They are present in a wide variety of popular algorithms. A linear equation denotes a linear combination of inputs in the coordinate plane. The coefficients of the inputs serve as the function's parameters. This type of function, in other words, responds to the equation y = wx + b. Here, y is the output, x is the input, w is the weight, and b is an offset (bias). 

Despite the simplicity, linear functions perform well on many real-world problems. Plus, they are helpful in Machine Learning for several reasons. The first is Linear Regression. Here, supervised learning algorithms predict a continuous outcome (y) based on input variables (x). Also, there's Linear Classification. These algorithms allow you to classify inputs into one of several predefined classes. Its most popular algorithm is the perceptron. It tags input data into one of two classes using a linear function to separate data into two regions. 

Further, Linear Functions enclose Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). The first one, PCA, has many uses in Machine Learning, like dimensionality reduction. It also included finding data directions that contain the majority of the variance. With it, it can project the data into a lower dimensional space. Likewise, LDA is a powerful reduction technique critical in supervised classification problems. Some of its uses include separating two or more classes and modeling differences.

Linear Algebra in Machine Learning

"Algebra is the intellectual instrument that has been created for rendering clear the quantitative aspects of the world" — Alfred North Whitehead.

Algebra is the branch that deals with math symbols and the rules that govern them. Scientists use it to investigate mathematical structures like equations, polynomials, and functions. Further, it's the foundation of several abstract concepts, like equations and variable operations. Also, it's the root of number properties, like associative, commutative, and distributive. Algebraic manipulation aids in the understanding of data patterns. Beyond Machine Learning, it's a necessary field for decision-making.

It's impossible to overstate the relevance of linear algebra in Machine Learning. It plays a foundational role in a wide range of ML algorithms. These go from simple regression analysis to more complex Deep Learning techniques. Many advanced and powerful Machine Learning models wouldn't exist without linear algebra!

The cornerstone for analyzing relationships between random variables in data sets is linear algebra. Let's take a simple linear regression model as an example. Linear algebra's coefficients define how each variable in the prediction model contributes to the response variable. Let's talk about the aspects of linear algebra used in Machine Learning.

Scalars for Machine Learning

These are single numbers defining a quantity without direction. For example, a scalar of "3" can stand for three apples, three years, or three miles. In Machine Learning, scalars help define relationships between variables. Languages like Python offer four types of scalar variables: int, float, None, and bool. NumPy's documentation holds a lot of info about scalars.

x = 3 # int
x = 3.2 # float

Here's how it would look using JavaScript:

const myScalar = 3;
let x = 3;
var y = 3;

Vectors for Machine Learning

A vector is a mathematical structure that has both size and direction. It often represents physical quantities such as velocity, force, or acceleration. You can think of it as a list of numbers. In other words, it is a single-dimensional array of numbers. That would be a horizontal vector. Here's how you can create one using Python and Pandas. 

import numpy as pn
my_list = [1,2,3]
my_vector = pn.arrays(myList)
Output: [1 2 3]

One quick way to create a vertical vector is by using the methods reshape(-1, 1) and shape:

vr_vector = my_vector.reshape(-1, 1)


Matrixes for Machine Learning

These are rectangular representations of columns and rows. Matrixes can present as 2-dimensional arrays to represent and manage large datasets. In turn, they perform operations like multiplication, inversion, and decomposition. 

The use of matrixes in Supervised Learning relates to training data's features and labels. Likewise, it helps depict similarities or dissimilarities between different examples of unsupervised learning. Matrix elements also apply to optimization methods and regularization of Machine Learning models. With Python, you can create matrices as a 2-dimensional list (a list of lists). 

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

This operation results in a 3x3 matrix. Each inner list represents a row, and the integers within each internal list represent the row's elements. We can also create a matrix in Python by using the NumPy library, which has additional functionality for working with vectors and matrices.

import numpy as np
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

Tensors for Machine Learning

A tensor is a multidimensional array of numbers. In ML, tensors represent various data types, such as images, videos, and audio. Tensors can have any number of dimensions and represent data in multiple formats and types. This variety includes scalars, vectors, matrix algebra operations, and higher-dimensional arrays. While a one-dimensional tensor can be a vector, a two-dimensional tensor can be a matrix operation.

Tensors' use in Deep Learning depicts data flowing through the network. This process encloses input data, intermediate representations, and output predictions. It also has applications in network computations. Examples surround matrix multiplication and non-linear activation functions.

The following is a straightforward example of a tensor using JavaScript:

const tensor = [ [1,2,3],[4,5,6],[7,8,9] ];

The example above is a way of conveying tensors in JavaScript without libraries. You would use a library like TensorFlow.js to perform tensor operations.

Statistics in Machine Learning

Statistics explores data collection, organization, analysis, interpretation, and presentation. It goes from probability theory, mathematical analysis, and algorithms to conclude large datasets. Further, it provides fundamental concepts and methods for comprehending and analyzing data. 

The science of statistics is often used in ML to model or predict outcomes from given inputs. Statistical models allow predictions based on previous observations, allowing for new insights. Machine Learning algorithms identify input data patterns and use them to predict events. To do so, it includes regression analysis, clustering, and classification. As a result, it offers greater accuracy than traditional methods. The main types of statistics are descriptive and inferential.

Descriptive Statistics 

The descriptive type presents summaries of datasets' properties without conclusions or predictions. Its techniques include histograms, boxplots, scatter plots, and bar charts. Further, these enable exploratory data analysis and the narrowing of large datasets.

Inferential Statistics

The inferential ones focus on predictions or generalizations based on a sample's data. This type of analysis helps researchers draw conclusions about a population. In ML, these forecast with incomplete data or extrapolate trends from small datasets. 

Mathematics for Machine Learning

Measure of Spread in Machine Learning

This statistical value describes a dataset's variation or diversity. The most common examples are range, interquartile range, variance, and standard deviation. The measure of spread is essential in ML because it provides insight into the data distribution. With it, specialists can make informed decisions on feature scaling and model selection. Furthermore, they can identify outliers and unusual values that may need special treatment.

Hypothesis Testing in Machine Learning

This method focuses on making inferences about a population based on a sample of data. It entails developing a null and alternative hypothesis. Then, it applies statistics to define the chance of getting data if the null hypothesis is true. The test results determine if researchers should reject the null hypothesis or not.

Probability Theory in Machine Learning

This statistic measures the likelihood of an event happening. Events are between 0 and 1, where 0 is an impossible event, and 1 is a certain one. Probability Theory is a fundamental concept used in a wide range of fields. Instances include decision-making, risk management, and, of course, Machine Learning. In ML, it makes predictions and estimates performance. Many ML algorithms, like Bayesian, Markov, and Gaussian, rely on these models.

Relevant ML Statistics Concepts

There are core concepts to know before diving into modern Machine Learning algorithms. 

Data. It's the info collected and analyzed to draw conclusions or make inferences. Data can be numerical or categorical; researchers gather it with several techniques. 
Population. It refers to all objects or measurements whose properties are under study. The entire set of observations or data points draws a sample.
Sampling. The concept refers to selecting a subset from a larger population. The most common sampling types include Stratified, Cluster, and Multistag.
Parameters & Variables. A parameter is a metric used to represent a population trait. Meanwhile, variables are metrics of interest for each entity in a population.


A deep understanding of Math is essential in Artificial Intelligence (AI), Computer Science, Machine Learning, Neural Networks, and Deep Learning (DL). You need to handle these key concepts to understand how ML algorithms work and how to create models. These foundational ideas include logistic regression, conditional probability, probability distributions, gradient descent, confidence intervals, algebraic equations, and multivariable calculus.

Statistics is another key topic if you're considering a career in Machine Learning and Data Science, allowing a solid understanding of Machine Learning and making sense of the underlying structure of the data. All the prior is vital for industry experts to build accurate, robust models, like Generative Adversarial Networks (GANs), forecasting success in the Data Science industry.