What Are Vectors and How Do They Shape Your Data?

Linear Algebra #1

Dec 21, 2024

Welcome to the first episode of Linear Algebra for Machine Learning!

This series is designed to help you master the mathematical foundations of machine learning, one step at a time. Today, we start with vectors—the fundamental building blocks of data representation in ML. From representing pixel intensities in an image to encoding user preferences in a recommendation system, vectors form the backbone of almost every machine learning model.

By the end of this episode, you’ll have a solid understanding of:

What vectors are and why they matter in machine learning.
Core properties like magnitude, direction, and dimensionality.
Real-world applications of vectors in ML.
Hands-on coding to reinforce your learning.

Let’s dive in!

What Are Vectors?

A vector is an ordered collection of numbers that represents a point in space or a direction. These numbers, called components, allow us to encode information in a structured way. Mathematically, a vector is represented as:

\(v=[v1,v2,...,vn]\)

Here, v1, v2, ..., vn are components of the vector.

Vectors are not just abstract mathematical objects—they’re practical tools used to represent:

Data Points: A customer’s age, income, and spending score as [35,70,8]
Images: Each pixel’s intensity in grayscale or RGB.
Text: Word embeddings like [0.1, 0.7, −0.3] that capture semantic meaning.

Why Do Vectors Matter in ML?

Vectors allow us to:

Represent Data: In ML, datasets are often stored as matrices where rows are data points (vectors) and columns are features.
Perform Transformations: Algorithms like PCA rely on vector transformations to reduce dimensionality.
Measure Similarity: Cosine similarity between vectors helps in text-based ML models like recommendation systems.

Key Properties of Vectors:

Understanding the properties of vectors is crucial to unlocking their potential in ML:

Magnitude (Length):
- Measures the size of the vector.
- Formula:
  \(|\mathbf{v}|| = \sqrt{v_1^2 + v_2^2 + ... + v_n^2}\)
- Example: A vector [3,4] has a magnitude calculated as below:
  \( \sqrt{3^2 + 4^2} = 5\)
Direction:
- Indicates where the vector points in space.
- Often represented as a unit vector with magnitude 1.
- Formula: Formula:
  \(\hat{\mathbf{v}} = \frac{\mathbf{v}}{||\mathbf{v}||}\)
Dimensionality:
- Refers to the number of components in the vector.
- 2D Example: [x, y], e.g., [3, 4]
- High-Dimensional Example: [v1,v2,...,vn]
Components:
- Each number in the vector corresponds to a feature or attribute.

Vector Operations

Vector operations form the basis of many ML algorithms. Let’s break them down with examples:

Addition and Subtraction: Combine or compare data points.
- Formula:
  \(a+b=[a1+b1,a2+b2,...,an+bn]\)
- Example: Adding user features like [age, income] to build a composite profile.
Scaling (Multiplication by a Scalar): Adjust the magnitude of a vector without changing its direction.
- Formula:
  \(c⋅v=[c⋅v1,c⋅v2,...,c⋅vn]\)
Dot Product: Measures similarity between two vectors.
- Formula:
  \(a \cdot b = \sum_{i=1}^{n} a_i b_i\)
- Use Case: Recommender systems use the dot product to compare user and item vectors.
Cross Product (3D): Used in computer vision and 3D modeling.

Hands-On Coding

Let’s make this section more engaging with real-world ML examples:

import numpy as np  

# Define vectors  
user_vector = np.array([25, 60, 8])  # [age, income, spending score]  
item_vector = np.array([0.4, 0.5, 0.1])  # [weight factors]  

# Dot product to calculate recommendation score  
score = np.dot(user_vector, item_vector)  
print("Recommendation Score:", score)  

# Normalize vectors for cosine similarity  
norm_user = user_vector / np.linalg.norm(user_vector)  
norm_item = item_vector / np.linalg.norm(item_vector)  
similarity = np.dot(norm_user, norm_item)  
print("Cosine Similarity:", similarity)

That’s a wrap (for now!)

Vectors are more than just numbers—they’re the language of machine learning. From representing data to enabling transformations, they’re at the heart of every algorithm.

Next week, we’ll explore Vector Operations in depth—how to combine, transform, and optimize vectors to unlock the true power of ML.

Share The Data Chronicles

The Data Cell

Discussion about this post