Naive Bayes

From CS Wiki
Revision as of 20:40, 3 November 2024 by 핵톤 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The Naive Bayes algorithm is a probability-based classification method that calculates the likelihood of data belonging to a specific class by using conditional probabilities. As suggested by the term "naive," this algorithm assumes that each feature is independent of the others. While this assumption is often unrealistic, Naive Bayes proves to be practical and efficient in classification tasks, providing good performance on real-world data.

Naive Bayes is particularly useful in text classification tasks, such as email spam filtering, sentiment analysis, and document categorization. The core idea of Naive Bayes is to use Bayes' theorem to compute the posterior probability of each class and classify based on the class with the highest posterior probability.

Characteristics[edit | edit source]

  • Simple and Fast: Its efficient computation makes it suitable for handling large datasets.
  • Independence Assumption: Assumes each feature is independent, which reduces computational complexity.
  • Accuracy: Despite the unrealistic independence assumption, it performs well on text classification and data with specific patterns.

Example[edit | edit source]

Naive Bayes was widely used in early email spam filtering approaches. The general method is as follows: First, it learns the probability of certain words appearing in spam and non-spam emails. For example, if words like "free" or "win" have a high probability of appearing, it can predict a higher chance of the email being spam.

  1. Data Preparation: Prepare a dataset labeled as spam and non-spam emails.
  2. Probability Calculation: Learn the probability of each word (e.g., "free," "win," "welcome") appearing in spam and non-spam emails.
  3. Classification: When a new email arrives, calculate the likelihood of it being spam based on the probabilities of each word appearing in spam or non-spam.
  4. Result: Classify the email as "spam" or "non-spam" based on the computed probability.

Variants[edit | edit source]

Several variations of the Naive Bayes model exist, with prominent ones being Gaussian Naive Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes.