Logistic Regression-First Classification model in ML

Yamini
Geek Culture
Published in
6 min readApr 8, 2021

--

Machine Learning is the part of the AI where the machine learns the data given to it with minimal human intervention. Machine Learning is divided into four categories namely Supervised, Unsupervised, Semi-Supervised, and Reinforcement Learning. Supervised Learning is used when the label is known and is given to the machine so that further such data can be easily identified. The Supervised ML is further divided into two types. They are Regression and Classification tasks. In the process, the first basic regression algorithm to be known is Linear Regression and the first Classification algorithm to be known is Logistic Regression.

Today the topic under discussion is Logistic regression before that it is better to know Linear Regression basically. Still, you can read Logistic Regression and if you want to know about Linear Regression click on Know about Linear Regression.

The question arises: If Logistic regression is only used for Classification how does the Logistic Regression named as ‘Regression’ instead of ‘Classification’.

Answer here: This is a valid question. Logistic Regression is an extension of the linear regression model for Classification tasks. So is the name ‘Logistic Regression’. Furthermore, the logistic regression has values between 0 to 1 or more. All are discrete values and used for only classification problems.

Logistic Regression which takes only two output values like 0,1(i,e., Dichotomous) is called Binary Logistic Regression. When the output can take more than two values called Multinomial Logistic Regression. For example, if you want to predict whether the person gets the ticket or not is solved using Binary Logistic Regression whereas when you want to know whether tomorrow it is going to be cloudy, rainy, or hot you use Multinomial Logistic Regression.

Logistic Regression

The Logistic Regression theory is below for knowing the intuition behind the logistic regression algorithm:

In logistic regression, the log-odds of a categorical response or target or output being true or positive is modelled as the linear combination of the features. Mathematically, log(p/1-p)= w0 + w1x1 +w2x2 + ….wjxj(=wTx) Applying exponential on both sides, you get=>p/1-p = e^WTx Finally solving for p you get => p = e^wTx/1+e^wTx or hw(x)= 1/1+e^-wTx

The above p(true value) or hw(x) are called logit function. [Both p=hw(x)]

Cost function for logistic equation — Π(class 1) hw(x) Π(class 0) 1-hw(x). This can be further written as Σ(i=1 to N) yi log hw(x) + (1-yi) log(1-hw(x))

The above equation is called the Maximum Likelihood function which needs to maximized as it gives the probability of being true. The above function can be further simplified as we already have logit function which can be substituted un above equation to get, Σ(i=1 to N) (yi-1) (wTxi)-log(1-e-WTxi)

Gradient descent can also be applied on Logistic Regression cost function,

Theta and w indicate the weights and are the same. Derivative w.r.t to w or theta is the same.

For Logistic Regression the cost function is log loss whereas for Multinomial Logistic Regression it is categorical cross-entropy. In Logistic Regression, we have Sigmoid function, Maximum Likelihood estimation, and gradient descent and in Multinomial Logistic Regression, we have Softmax function, Cross-entropy loss function, and Stochastic gradient descent.

Multinomial Cross-Entropy

To explain this cross-entropy loss it is like all the probabilities and finding the highest of it. Example if there are three classes then f(1),f(2),f(3) is calculated.

Mostly in ML Logistic Regression is used more often than Multinomial Logistic Regression and in Advance ML this comes more in use as Deep Learning problems are complex and for large datasets, these are often used as we come across more categorical data than in simple datasets.

The evaluation metrics for the Classification model are Accuracy, Precision, Recall, False positive rate, F1 score, or F measure. All the above can be known using a confusion matrix. Roc_auc score is also one important metric for evaluation for Classification tasks and very useful when the target class is imbalanced.

There are many hyperparameters in Logistic Regression like penalty->where l1,l2, the elastic net can be used, C-> for regularization the lower the C values higher the regularization, the higher the C value lower or no regularization, multiclass-> where it can be ‘auto’,’ovr’,’ multinomial’ as the name says ovr for binary, auto when anything is not stated, multinomial when there are more than two classes and so on. You can see all the hyperparameters list on the Sklearn Logistic Regression page on Scikit-learn.org.

The implementation is simple and few lines of code.

Fitting Logistic model to the data
Finding the accuracy of the model with true and predicted values
Intercept and coefficient can be known as in Linear regression
Even the probabilities of class 0 and 1 can be known using predict_proba

Not only the probabilities can be known we can readjust the threshold by which classes can be predicted. By default, the threshold is 0.5 means if the probability is <0.5 its is predicted as class 0 else if it is >0.5 the model predicts it as class 1. This can be changed to 0.2 or 0.4 or anything as per the needs. So flexible right?

Logistic Regression has its pros and cons:

  • Logistic Regression requires a large dataset and also sufficient training examples for all the categories it needs to identify.
  • Logistic Regression is a statistical analysis model that attempts to predict precise probabilistic outcomes based on independent features. On high dimensional datasets, this may lead to the model being over-fit on the training set, which means overstating the accuracy of predictions on the training set and thus the model may not be able to predict accurate results on the test set. This usually happens in the case when the model is trained on little training data with lots of features. So on high dimensional datasets, Regularization techniques should be considered to avoid over-fitting (but this makes the model complex). Very high regularization factors may even lead to the model being under-fit on the training data.
  • Logistic Regression assumes linearity whereas it is rarely found in real-world problems. It is tough to obtain complex relationships using logistic regression. More powerful and compact algorithms such as Neural Networks can easily outperform this algorithm.
  • Logistic regression is less inclined to over-fitting but it can overfit in high dimensional datasets. One may consider Regularization (L1 and L2) techniques to avoid over-fitting in these scenarios. Then In Linear Regression independent and dependent variables are related linearly. But Logistic Regression needs that independent variables are linearly related to the log odds (log(p/(1-p)).
  • The presence of data values that deviate from the expected range in the dataset may lead to incorrect results as this algorithm is sensitive to outliers.

Aim higher, Fly higher

Doubt kills more dream than failure ever will

Make each day your master piece

If you find this article useful and have learned something show some support and share it with some other people who like to know about it. If you have anything to say I’m ready to listen to your suggestions and let me know if you have any questions or requests. Keep doing the best. 🥰Stay safe and keep bringing the light to the world. Will meet in the next masterpiece.😉

--

--

Yamini
Geek Culture

Blogger, Achiever, Data science aspirant, Soulful person, Optimist