02 — Learning

Machine Learning Topics

A curated path from foundational regression models to advanced unsupervised learning. Each topic includes core concepts, sample implementation, and a link to the full notebook.

Simple Linear Regression

Implemented basic linear regression to understand the relationship between a single independent variable and a dependent variable. Focused on cost function minimization, model evaluation (MSE, R²), and visualization of the regression line.

Cost FunctionMSER² ScoreMatplotlib

    
python
 from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Training
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)

# Inference
y_pred = model.predict(X_test)

# Evaluation
print("MSE:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))
 

View Notebook

Multiple Linear Regression

Extended linear regression to handle multiple features. Included feature scaling, multicollinearity check, and model interpretation using coefficients.

Feature ScalingStandardScalerMulticollinearityCoefficients

    
python
 from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

model = LinearRegression()
model.fit(X_train_scaled, y_train)

y_pred = model.predict(scaler.transform(X_test))
 

View Notebook

Gradient Descent from Scratch

Implemented Linear Regression using Gradient Descent algorithm manually to deeply understand optimization process and learning rate effects.

OptimizationLearning RateGradientsNumpy

    
python
 theta = np.zeros((n_features, 1))
for iteration in range(1000):
    gradients = (2/m) * X_b.T.dot(X_b.dot(theta) - y)
    theta = theta - learning_rate * gradients
# Prediction
y_pred = X_b.dot(theta)
 

View Notebook

Classification Basics

Built foundational classification models. Covered binary & multiclass classification, confusion matrix, precision, recall, and F1-score.

Logistic RegressionKNNConfusion MatrixF1-score

    
python
 from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
 

View Notebook Live Demo

Naive Bayes

Implemented Gaussian and Multinomial Naive Bayes for fast probabilistic classification. Excellent for text and high-dimensional data.

Gaussian NBMultinomial NBProbabilistic MLText Classification

    
python
 from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
 

View Notebook Live Demo

Support Vector Machine (SVM)

Used SVM with different kernels (linear, rbf, poly) for both classification and regression. Focused on hyperplane maximization and soft margin.

HyperplaneKernelsSoft MarginSVC

    
python
 from sklearn.svm import SVC
model = SVC(kernel='rbf', C=1.0, gamma='scale', probability=True)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
 

View Notebook Live Demo

Decision Tree

Built interpretable decision trees with pruning techniques (max_depth, min_samples_split) and visualized the tree structure.

PruningEntropyInformation GainVisualization

    
python
 from sklearn.tree import DecisionTreeClassifier, plot_tree
model = DecisionTreeClassifier(max_depth=5, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
 

View Notebook

Random Forest

Implemented bagging techniques and compared feature importance analysis using the Random Forest classifier.

BaggingRandom ForestFeature ImportanceHyperparameter Tuning

    
python
 from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=200, max_depth=10, random_state=42)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
 

View Notebook Live Demo

Ensemble Learning

Implementation of advanced ensemble techniques including Voting, Bagging, and Boosting to improve model accuracy and robustness.

Voting ClassifierStackingBoostingAdaboost

    
python
 from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

clf1 = LogisticRegression()
clf2 = SVC(probability=True)
eclf = VotingClassifier(estimators=[('lr', clf1), ('svc', clf2)], voting='soft')
eclf.fit(X_train, y_train)
 

View Notebook

Dimensionality Reduction (PCA)

Applied Principal Component Analysis for feature reduction, visualization, and improving model performance.

VarianceEigenvaluesFeature ReductionVisualization

    
python
 from sklearn.decomposition import PCA
pca = PCA(n_components=0.95)        # Keep 95% variance
X_reduced = pca.fit_transform(X_scaled)
 

View Notebook

K-Means Clustering

Unsupervised clustering using K-Means with elbow method and silhouette score for optimal cluster selection.

Elbow MethodSilhouette ScoreCentroidsUnsupervised

    
python
 from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(X_scaled)
labels = kmeans.predict(X_scaled)
centers = kmeans.cluster_centers_
 

View Notebook

DBSCAN Clustering

Density-based clustering for discovering clusters of arbitrary shape and detecting outliers.

Density-basedOutliersEpsilonMin Samples

    
python
 from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X_scaled)
 

View Notebook