Educational Roadmap

Machine Learning Topics

A curated path from foundational regression models to advanced unsupervised learning. Each topic includes core concepts, sample implementation, and a link to the full notebook.

Simple Linear Regression

Implemented basic linear regression to understand the relationship between a single independent variable and a dependent variable. Focused on cost function minimization, model evaluation (MSE, R²), and visualization of the regression line.

Cost Function MSE R² Score Matplotlib

python

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Training
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)

# Inference
y_pred = model.predict(X_test)

# Evaluation
print("MSE:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))

View Full Notebook

Multiple Linear Regression

Extended linear regression to handle multiple features. Included feature scaling, multicollinearity check, and model interpretation using coefficients.

Feature Scaling StandardScaler Multicollinearity Coefficients

python

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

model = LinearRegression()
model.fit(X_train_scaled, y_train)

y_pred = model.predict(scaler.transform(X_test))

View Full Notebook Live Demo

Gradient Descent from Scratch

Implemented Linear Regression using Gradient Descent algorithm manually to deeply understand optimization process and learning rate effects.

Optimization Learning Rate Gradients Numpy

python

theta = np.zeros((n_features, 1))
for iteration in range(1000):
    gradients = (2/m) * X_b.T.dot(X_b.dot(theta) - y)
    theta = theta - learning_rate * gradients
# Prediction
y_pred = X_b.dot(theta)

View Full Notebook

Classification Basics

Built foundational classification models. Covered binary & multiclass classification, confusion matrix, precision, recall, and F1-score.

Logistic Regression KNN Confusion Matrix F1-score

python

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

View Full Notebook Live Demo

Naive Bayes

Implemented Gaussian and Multinomial Naive Bayes for fast probabilistic classification. Excellent for text and high-dimensional data.

Gaussian NB Multinomial NB Probabilistic ML Text Classification

python

from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

View Full Notebook Live Demo

Support Vector Machine (SVM)

Used SVM with different kernels (linear, rbf, poly) for both classification and regression. Focused on hyperplane maximization and soft margin.

Hyperplane Kernels Soft Margin SVC

python

from sklearn.svm import SVC
model = SVC(kernel='rbf', C=1.0, gamma='scale', probability=True)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

View Full Notebook Live Demo

Decision Tree

Built interpretable decision trees with pruning techniques (max_depth, min_samples_split) and visualized the tree structure.

Pruning Entropy Information Gain Visualization

python

from sklearn.tree import DecisionTreeClassifier, plot_tree
model = DecisionTreeClassifier(max_depth=5, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

View Full Notebook

Random Forest

Implemented bagging techniques and compared feature importance analysis using the Random Forest classifier.

Bagging Random Forest Feature Importance Hyperparameter Tuning

python

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=200, max_depth=10, random_state=42)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)

View Full Notebook Live Demo

Ensemble Learning

Implementation of advanced ensemble techniques including Voting, Bagging, and Boosting to improve model accuracy and robustness.

Voting Classifier Stacking Boosting Adaboost

python

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

clf1 = LogisticRegression()
clf2 = SVC(probability=True)
eclf = VotingClassifier(estimators=[('lr', clf1), ('svc', clf2)], voting='soft')
eclf.fit(X_train, y_train)

View Full Notebook

Dimensionality Reduction (PCA)

Applied Principal Component Analysis for feature reduction, visualization, and improving model performance.

Variance Eigenvalues Feature Reduction Visualization

python

from sklearn.decomposition import PCA
pca = PCA(n_components=0.95)        # Keep 95% variance
X_reduced = pca.fit_transform(X_scaled)

View Full Notebook

K-Means Clustering

Unsupervised clustering using K-Means with elbow method and silhouette score for optimal cluster selection.

Elbow Method Silhouette Score Centroids Unsupervised

python

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(X_scaled)
labels = kmeans.predict(X_scaled)
centers = kmeans.cluster_centers_

View Full Notebook

DBSCAN Clustering

Density-based clustering for discovering clusters of arbitrary shape and detecting outliers.

Density-based Outliers Epsilon Min Samples

python

from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X_scaled)

View Full Notebook