Logistic Regression Model using Python

Nadir R.

Project Manager | Cloud | FinTech | Observability

Published Jun 1, 2024

Logistic regression is a popular and widely used statistical method for binary classification. It is a type of regression analysis used for predicting the outcome of a categorical dependent variable based on one or more predictor variables. In this blog, we will explore the basics of logistic regression, how it works, and how to implement it using Python.

Introduction to Logistic Regression

Logistic regression is a type of regression analysis where the dependent variable is binary (0 or 1, true or false, success or failure). It estimates the probability that a given input belongs to a certain category. Unlike linear regression, logistic regression models the probability that a given input point belongs to a specific class.

The logistic function (also called the sigmoid function) is used to map predicted values to probabilities. The logistic function is defined as:

σ(x)=11+e−x\sigma(x) = \frac{1}{1 + e^{-x}}σ(x)=1+e−x1

where σ(x)\sigma(x)σ(x) is the output of the logistic function for input xxx.

How Logistic Regression Works

Logistic regression works by fitting a linear equation to the data, then applying the logistic function to the output of this linear equation to produce a probability. The linear equation has the form:

z=β0+β1x1+β2x2+…+βnxnz = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_nz=β0+β1x1+β2x2+…+βnxn

where:

β0\beta_0β0 is the intercept,
β1,β2,…,βn\beta_1, \beta_2, \ldots, \beta_nβ1,β2,…,βn are the coefficients,
x1,x2,…,xnx_1, x_2, \ldots, x_nx1,x2,…,xn are the input features.

The logistic function then transforms zzz to a value between 0 and 1, representing the probability that the input point belongs to the positive class.

Steps to Implement Logistic Regression in Python

To implement logistic regression in Python, we will use the popular machine learning library scikit-learn. Here are the steps involved:

Import Libraries: Import the necessary libraries.
Load Dataset: Load the dataset for training and testing.
Preprocess Data: Prepare the data for modeling.
Train the Model: Fit the logistic regression model to the training data.
Make Predictions: Use the model to make predictions on new data.
Evaluate the Model: Assess the performance of the model.

Step 1: Import Libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

Step 2: Load Dataset

For this example, we will use the famous Pima Indians Diabetes dataset, which is available in many data repositories.

# Load dataset
url = "https://raw.githubusercontent.com/plotly/datasets/master/diabetes.csv"
df = pd.read_csv(url)

# Display the first few rows of the dataset
df.head()

Recommended by LinkedIn

Understanding Bayesian with Examples In Python

Rany ElHousieny, PhDᴬᴮᴰ 1 year ago

Modular Markov Chain Monte Carlo in Python

Patrick Nicolas 1 year ago

Automating Data Cleaning with Python: A…

Walter Shields 3 months ago

Step 3: Preprocess Data

We need to preprocess the data by handling missing values (if any), normalizing the data, and splitting it into training and testing sets.

# Check for missing values
df.isnull().sum()

# Split the data into features (X) and target (y)
X = df.drop('Outcome', axis=1)
y = df['Outcome']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Train the Model

Next, we will create an instance of the logistic regression model and fit it to the training data.

# Create an instance of Logistic Regression
model = LogisticRegression(max_iter=1000)

# Fit the model to the training data
model.fit(X_train, y_train)

Step 5: Make Predictions

We can now use the trained model to make predictions on the test data.

# Make predictions on the test data
y_pred = model.predict(X_test)

Step 6: Evaluate the Model

Finally, we will evaluate the model's performance using various metrics.

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Generate confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

# Generate classification report
class_report = classification_report(y_test, y_pred)
print("Classification Report:")
print(class_report)

# Visualize the confusion matrix
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

4. Example: Predicting the Likelihood of Diabetes

Let's put it all together in a complete example where we predict the likelihood of diabetes based on various health metrics.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
url = "https://raw.githubusercontent.com/plotly/datasets/master/diabetes.csv"
df = pd.read_csv(url)

# Preprocess data
X = df.drop('Outcome', axis=1)
y = df['Outcome']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

class_report = classification_report(y_test, y_pred)
print("Classification Report:")
print(class_report)

sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

Logistic regression is a powerful tool for binary classification problems. It is simple to understand and easy to implement, making it a great choice for many practical applications. In this blog, we covered the basics of logistic regression, how it works, and how to implement it in Python using scikit-learn. We also provided a complete example using the Pima Indians Diabetes dataset.

By following these steps, you can apply logistic regression to your own binary classification problems and gain valuable insights from your data.

Author

Nadir Riyani is an accomplished and visionary Engineering Manager specialising in AI/ML technologies. With a wealth of experience leading high-performing engineering teams, Nadir is passionate about leveraging artificial intelligence and machine learning to drive innovation and solve complex challenges. His expertise spans across software development principles, encompassing Agile, Automation and DevOps methodologies. Nadir's commitment to engineering excellence and ability to align technical strategies with business objectives make him a valuable asset to any organization. For further inquiries, please feel free to reach out to him at riyaninadir@gmail.com.

Adrian Olszewski

Clinical Trials Biostatistician at 2KMM (100% R-based CRO) ⦿ Frequentist/NHST (non-Bayesian) paradigm only ⦿ NOT a Data Scientist (no ML/AI/Big data) ⦿ Against anti-car/-meat/-cash/-house and C40 restrictions

Let me only add that outside Machine Learning, in classic statistics, it's a key example of a regression algorithm - just by its definition of conditional expectation. No surprise - it was invented (McFadden, Berkson, Cox, Nelder) and embedded into the Generalized Linear Model family (Nelder, Wedderburn) exactly for the purpose of doing regression work years before it was used for classification (mainly to replace the probit regression). Nowadays it is used for regression and testing hypotheses by thousands of statisticians, e.g. in experimental and exploratory research, like clinical trials. For instance, it's my daily regression tool (and I have never used it for classification so far). I mention that, because some of ML specialists may want to join, one day, a field, where classic statistics is used and the logistic regression has many more applications (and classification is not the major one). https://www.linkedin.com/pulse/logistic-regression-has-been-since-its-birth-adrian-olszewski-haygf/ or - if you prefer Medium - https://medium.com/@r.clin.res/is-logistic-regression-a-regression-46dcce4945dd?source=friends_link&sk=e9cb5449363197f85c0d6d0f4211a562

1 Reaction

To view or add a comment, sign in

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Logistic Regression Model using Python

Nadir R.

Project Manager | Cloud | FinTech | Observability

Introduction to Logistic Regression

How Logistic Regression Works

Steps to Implement Logistic Regression in Python

Step 1: Import Libraries

Step 2: Load Dataset

Recommended by LinkedIn

Step 3: Preprocess Data

Step 4: Train the Model

Step 5: Make Predictions

Step 6: Evaluate the Model

4. Example: Predicting the Likelihood of Diabetes

More articles by Nadir R.

Sign in

Others also viewed

IV Implementing a Systemic Dimensional Cyberprofiling Model in Python

Building a Machine Learning Model from Scratch Using Python

📊 Mellin Transform Demystified: Python Tackles Satellite Patterns 🛰️

SIMPLE LINEAR REGRESSION IN PYTHON :

Mastering Logistic Regression: The Complete Guide with Python Code

How to build Gradient Boosting Regressor in Python?

The Ultimate Guide To Speech Recognition With Python

Python Speech Recognition – Artificial Intelligence

17 Top Applications of Machine Learning with Python

Building a Simple Spam Detection Model Using Python and Machine Learning

Explore topics

Introduction to Logistic Regression

How Logistic Regression Works

Steps to Implement Logistic Regression in Python

Step 1: Import Libraries

Step 2: Load Dataset

Recommended by LinkedIn

Step 3: Preprocess Data

Step 4: Train the Model

Step 5: Make Predictions

Step 6: Evaluate the Model

4. Example: Predicting the Likelihood of Diabetes

More articles by Nadir R.

AWS: Hosting Relational Databases via Amazon RDS

AWS: How to host web application?

BrightLocal SEO Tool

Rasa: Implementation using Python

Rasa: Revolutionizing Chatbots and AI Assist

Plan, Do, Check, and Act (PDCA) QA Cycle

make(Integromat) - Start simple. Think big. Automate smart.

Microsoft Auto Gen: Revolutionizing Automation and AI for Developers

CodeWhisperer: Amazon’s AI-Powered Coding Assistant

Axe by Deque: Tool for Web Accessibility Testing

Sign in

Others also viewed

IV Implementing a Systemic Dimensional Cyberprofiling Model in Python

Building a Machine Learning Model from Scratch Using Python

📊 Mellin Transform Demystified: Python Tackles Satellite Patterns 🛰️

SIMPLE LINEAR REGRESSION IN PYTHON :

Mastering Logistic Regression: The Complete Guide with Python Code

How to build Gradient Boosting Regressor in Python?

The Ultimate Guide To Speech Recognition With Python

Python Speech Recognition – Artificial Intelligence

17 Top Applications of Machine Learning with Python

Building a Simple Spam Detection Model Using Python and Machine Learning

Explore topics