Logistic regression: Difference between revisions

From CS Wiki
No edit summary
(Redirected page to Logistic Regression)
Tags: New redirect Visual edit
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
'''Logistic Regression'''
#REDIRECT [[Logistic Regression]]
 
'''Logistic Regression''' is a statistical and machine learning algorithm used for binary classification tasks, where the output variable is categorical and typically represents two classes (e.g., yes/no, spam/not spam, fraud/not fraud). Despite its name, Logistic Regression is a classification algorithm, not a regression algorithm, as it predicts probabilities of classes rather than continuous values.
 
== How It Works ==
 
Logistic Regression models the probability of a binary outcome using a logistic function, also known as the sigmoid function. The sigmoid function compresses values to range between 0 and 1, representing the probability of belonging to a particular class. The model predicts the probability that the input belongs to the positive class (1) and classifies it by applying a threshold, often 0.5.
 
The logistic function is represented by:
 
P(y=1 | X) = 1 / (1 + e^-(b0 + b1X1 + b2X2 + ... + bnXn))
 
where:
* '''P(y=1 | X)''' is the probability of the output being 1 given the input features.
* '''X1, X2, ..., Xn''' are the input features.
* '''b0''' is the intercept, and '''b1, b2, ..., bn''' are the coefficients of the features.
 
== Types of Logistic Regression ==
 
* '''Binary Logistic Regression''': Used for binary classification with two possible outcomes (e.g., yes/no).
* '''Multinomial Logistic Regression''': Used when the outcome variable has more than two categories without any ordering (e.g., classifying types of animals).
* '''Ordinal Logistic Regression''': Used when the outcome variable has ordered categories (e.g., ranking levels from low to high).
 
== Applications of Logistic Regression ==
 
Logistic Regression is widely used across industries due to its simplicity, interpretability, and effectiveness in binary classification tasks:
 
* '''Healthcare''': Predicting disease outcomes, risk assessments, and patient survival chances.
* '''Finance''': Credit scoring, fraud detection, and risk analysis.
* '''Marketing''': Customer churn prediction, targeting potential buyers, and lead qualification.
* '''Social Sciences''': Survey analysis, where responses fall into categories like agree/disagree or support/oppose.
 
== Key Metrics for Evaluating Logistic Regression ==
 
To assess the performance of a Logistic Regression model, common metrics include:
 
* '''Accuracy''': The proportion of correct predictions.
* '''Precision''': The ratio of true positive predictions to all positive predictions.
* '''Recall''': The ratio of true positive predictions to all actual positives.
* '''F1 Score''': The harmonic mean of precision and recall, useful when dealing with imbalanced data.
* '''AUC-ROC Curve''': Measures the model’s ability to distinguish between classes, where a higher Area Under the Curve (AUC) indicates better performance.
 
== Assumptions of Logistic Regression ==
 
Logistic Regression relies on several assumptions for accurate results:
 
1. '''Linearity of Independent Variables and Log-Odds''': Assumes a linear relationship between the log-odds of the outcome and the independent variables.
2. '''Independence of Observations''': Observations should be independent of each other to avoid biased results.
3. '''No Multicollinearity''': Independent variables should not be highly correlated with each other, which can be checked using Variance Inflation Factor (VIF).
4. '''Sufficient Sample Size''': Logistic Regression requires a large enough sample size, especially for categorical variables, to make accurate predictions.
 
== Handling Limitations ==
 
Logistic Regression may not perform well if the relationship between variables is highly non-linear. In such cases, transformations, polynomial features, or using a more complex model like Decision Trees or Neural Networks can be considered.
 
== See Also ==
* [[Linear Regression]]
* [[Support Vector Machine]]
* [[K-Nearest Neighbor]]
* [[Decision Tree]]
* [[Naive Bayes]]

Latest revision as of 21:33, 3 November 2024