New pages

From CS Wiki
New pages
Hide registered users | Hide bots | Hide redirects
(newest | oldest) View ( | ) (20 | 50 | 100 | 250 | 500)
  • 12:03, 4 November 2024Feature (Data Science) (hist | edit) ‎[3,321 bytes]핵톤 (talk | contribs) (Created page with "In data science, a '''feature''' is an individual measurable property or characteristic of a data point that is used as input to a predictive model. Terms such as '''feature, columns, attributes, variables, and independent variables''' are often used interchangeably to refer to the input characteristics in a dataset that are used for analysis or model training. ==Types of Features== Features can take various forms depending on the type of data and the problem being solve...") Tag: Visual edit
  • 00:35, 4 November 2024Cold Start Problem (hist | edit) ‎[3,384 bytes]핵톤 (talk | contribs) (Created page with "The Cold Start Problem is a common challenge in recommender systems, where the system struggles to make accurate recommendations due to a lack of sufficient data. This problem affects new users, new items, or entire systems that lack historical data, limiting the effectiveness of collaborative and content-based filtering techniques. ==Types of Cold Start Problems== Cold start issues can occur in several contexts: *'''User Cold Start''': When a new user joins the platform...") Tag: Visual edit
  • 00:34, 4 November 2024Content-Based Filtering (hist | edit) ‎[3,108 bytes]핵톤 (talk | contribs) (Created page with "Content-Based Filtering is a recommendation technique that suggests items to users based on the characteristics of items they have previously shown interest in. Unlike collaborative filtering, which relies on user behavior patterns, content-based filtering uses item attributes or features to make recommendations. ==How Content-Based Filtering Works== Content-based filtering involves analyzing item attributes and matching them to a user’s preferences or past interaction...") Tag: Visual edit
  • 00:31, 4 November 2024Collaborative Filtering (hist | edit) ‎[3,798 bytes]핵톤 (talk | contribs) (Created page with "Collaborative Filtering is a popular technique in recommender systems that predicts a user’s interest by identifying patterns from the behavior and preferences of similar users or items. It relies on the assumption that if users have agreed on past items, they are likely to agree on similar items in the future. ==Types of Collaborative Filtering== Collaborative Filtering can be divided into two main approaches: *'''User-Based Collaborative Filtering''': Recommends item...") Tag: Visual edit
  • 00:29, 4 November 2024Recommender System (hist | edit) ‎[3,866 bytes]핵톤 (talk | contribs) (Created page with "A Recommender System is a data-driven algorithm designed to suggest relevant items or content to users based on their preferences, behavior, or similar users’ choices. It is widely used in e-commerce, streaming services, social media, and other online platforms to enhance user experience by delivering personalized recommendations. ==Types of Recommender Systems== There are several main types of recommender systems, each with different approaches to making recommendatio...") Tag: Visual edit
  • 00:27, 4 November 2024AUC (hist | edit) ‎[34 bytes]핵톤 (talk | contribs) (Redirected page to Area Under the Curve) Tags: New redirect Visual edit
  • 00:26, 4 November 2024Precision-Recall Curve (hist | edit) ‎[3,840 bytes]핵톤 (talk | contribs) (Created page with "The Precision-Recall Curve is a graphical representation used in binary classification to evaluate a model's performance, especially in imbalanced datasets. It plots precision (y-axis) against recall (x-axis) at various threshold settings, showing the trade-off between the two metrics as the decision threshold changes. ==What is a Precision-Recall Curve?== A Precision-Recall Curve shows how well a model balances precision (the accuracy of positive predictions) and recall...") Tag: Visual edit
  • 00:19, 4 November 2024Gain Chart (hist | edit) ‎[3,862 bytes]핵톤 (talk | contribs) (Created page with "A Gain Chart, or Cumulative Gain Chart, is a graphical tool used to evaluate the effectiveness of a predictive model by showing the cumulative percentage of positive outcomes identified as more of the dataset is included. It helps assess how well the model ranks positive cases, particularly in applications where targeting high-value instances is essential. ==What is a Gain Chart?== A Gain Chart plots the cumulative percentage of positive outcomes (y-axis) against the cum...") Tag: Visual edit
  • 00:19, 4 November 2024Cumulative Response Curve (hist | edit) ‎[3,296 bytes]핵톤 (talk | contribs) (Created page with "'''Cumulative Response Curve (CRC)''' is graphical tools used in predictive modeling and data science to assess a model's ability to capture positive outcomes as more of the dataset is selected. They provide insight into how effectively a model identifies the highest value cases early in the ranking. ==What is a Cumulative Response Curve?== A Cumulative Response Curve plots the cumulative percentage of actual positive instances (y-axis) against the cumulative percentage...") Tag: Visual edit
  • 00:18, 4 November 2024Lift Curve (hist | edit) ‎[3,328 bytes]핵톤 (talk | contribs) (Created page with "A '''Lift Curve''' is a graphical representation used in predictive modeling to measure the effectiveness of a model in identifying positive outcomes, compared to a baseline of random selection. It shows how much more likely the model is to capture positive cases within selected segments compared to a random approach. ==What is a Lift Curve?== A Lift Curve plots the lift (y-axis) against the cumulative percentage of the dataset selected (x-axis). It illustrates how well...") Tag: Visual edit
  • 00:17, 4 November 2024Cumulative Response Curves (hist | edit) ‎[65 bytes]핵톤 (talk | contribs) (Created page with "'''Cumulative Response Curves (CRC)''' are graphical tools used in predictive modeling and data science to assess a model's ability to capture positive outcomes as more of the dataset is selected. They provide insight into how effectively a model identifies the highest value cases early in the ranking. ==What is a Cumulative Response Curve?== A Cumulative Response Curve plots the cumulative percentage of actual positive instances (y-axis) against the cumulative percentag...") Tag: Visual edit
  • 00:16, 4 November 2024Gain (Data Science) (hist | edit) ‎[3,713 bytes]핵톤 (talk | contribs) (Created page with "'''Gain''' is a metric used in data science, marketing, and predictive modeling to measure the cumulative success of a model in capturing positive outcomes as more of the dataset is utilized. It provides insight into how effectively a model ranks and selects positive cases, particularly in applications where maximizing the return on targeted resources is essential. ==What is Gain?== Gain quantifies the cumulative proportion of positive outcomes identified by the model as...") Tag: Visual edit
  • 00:15, 4 November 2024Lift (Data Science) (hist | edit) ‎[3,303 bytes]핵톤 (talk | contribs) (Created page with "'''Lift''' is a metric used in marketing, sales, and data science to measure the effectiveness of a predictive model, especially in identifying positive outcomes such as likely buyers or high-risk customers. It quantifies how much better a model performs in comparison to random chance. ==Understanding Lift== Lift evaluates the concentration of positive instances (e.g., buyers, responders) within a selected group compared to the overall rate of positives in the entire pop...") Tag: Visual edit
  • 00:12, 4 November 2024Data Science Cheat Sheet (hist | edit) ‎[6,117 bytes]핵톤 (talk | contribs) (Created page with "== Confusion Matrix and F1 Score == '''Confusion Matrix''' {| class="wikitable" |- ! !!Predicted Positive!!Predicted Negative |- |'''Actual Positive'''||True Positive (TP)||False Negative (FN) |- |'''Actual Negative'''||False Positive (FP)||True Negative (TN) |} '''F1 Score''' = 2 * (Precision * Recall) / (Precision + Recall) * 2 * (Positive Predictive Value * True Positive Rate) / (Positive Predictive Value + True Positive Rate) * 2 * (TP) / (TP + FP + FN) ==...") Tag: Visual edit
  • 00:11, 4 November 2024Specificity (Data Science) (hist | edit) ‎[2,246 bytes]핵톤 (talk | contribs) (Created page with "'''Specificity''', also known as the '''True Negative Rate (TNR)''', is a metric used in binary classification to measure the proportion of actual negative cases that are correctly identified by the model. It reflects the model’s ability to avoid false positives and accurately classify negative instances. ==Definition== Specificity is calculated as: :'''<big>Specificity = True Negatives / (True Negatives + False Positives)</big>''' A higher specificity value indicates...") Tag: Visual edit
  • 23:59, 3 November 2024Sensitivity (hist | edit) ‎[35 bytes]핵톤 (talk | contribs) (Redirected page to Recall (Data Science)) Tags: New redirect Visual edit
  • 23:59, 3 November 2024True Positive Rate (hist | edit) ‎[35 bytes]핵톤 (talk | contribs) (Redirected page to Recall (Data Science)) Tags: New redirect Visual edit
  • 23:57, 3 November 2024False Positive Rate (hist | edit) ‎[2,077 bytes]핵톤 (talk | contribs) (Created page with "The '''False Positive Rate (FPR)''' is a metric used in binary classification to measure the proportion of actual negatives that are incorrectly identified as positives by the model. It is an important metric for understanding the model's tendency to produce false alarms. ==Definition== The False Positive Rate is calculated as: :'''FPR = False Positives / (False Positives + True Negatives)''' This metric represents the likelihood of a negative instance being misclassifie...") Tag: Visual edit
  • 22:15, 3 November 2024Area Under the Curve (hist | edit) ‎[2,354 bytes]핵톤 (talk | contribs) (Created page with "The Area Under the Curve (AUC) is a metric used in classification tasks to evaluate the overall performance of a binary classification model. It represents the area under the ROC (Receiver Operating Characteristic) Curve, providing a single value that summarizes the model’s ability to distinguish between positive and negative classes across all thresholds. ==Definition== AUC values range from 0 to 1: *'''AUC = 1''': Indicates a perfect classifier that co...") Tag: Visual edit
  • 22:13, 3 November 2024ROC Curve (hist | edit) ‎[2,501 bytes]핵톤 (talk | contribs) (Created page with "The '''ROC (Receiver Operating Characteristic) Curve''' is a graphical representation used to evaluate the performance of a binary classification model. It plots the true positive rate (sensitivity) against the false positive rate (1 - specificity) at various threshold settings, providing insight into the trade-offs between sensitivity and specificity. ==Definition== The ROC Curve is created by plotting: *'''True Positive Rate (TPR)''' or Sensitivity: TPR = True Positive...") Tag: Visual edit
  • 22:05, 3 November 2024Classification Metrics (hist | edit) ‎[3,024 bytes]핵톤 (talk | contribs) (Created page with "'''Classification metrics''' are evaluation measures used to assess the performance of classification models in machine learning and data science. These metrics help determine how well a model can predict the correct class labels, particularly in supervised learning tasks. ==Common Classification Metrics== There are several widely used classification metrics, each serving different aspects of model performance: *'''Accuracy''': Measures the ratio of correct predictions t...") Tag: Visual edit
  • 22:03, 3 November 2024Confusion Matrix (hist | edit) ‎[2,343 bytes]핵톤 (talk | contribs) (Created page with "'''Confusion Matrix''' is a tool used in data science and machine learning to evaluate the performance of a classification model. It provides a tabular summary of the model's predictions against the actual values, breaking down the number of correct and incorrect predictions for each class. ==Structure== The confusion matrix is typically a 2x2 table for binary classification, with the following layout: *'''True Positives (TP)''': Correctly predicted positive instances *...") Tag: Visual edit
  • 21:59, 3 November 2024Recall (Data Science) (hist | edit) ‎[2,267 bytes]핵톤 (talk | contribs) (Created page with "'''Recall''' is a metric used in data science, particularly in classification problems, to measure the completeness of positive predictions. It represents the ratio of true positive predictions to the sum of true positives and false negatives, reflecting the model's ability to identify all relevant instances within the data. ==Definition== Recall is calculated as: :'''Recall = True Positives / (True Positives + False Negatives)''' This metric is crucial when the focus is...") Tag: Visual edit
  • 21:58, 3 November 2024Precision (Data Science) (hist | edit) ‎[2,396 bytes]핵톤 (talk | contribs) (Created page with "'''Precision''' is a metric used in data science, particularly in classification problems, to measure the accuracy of positive predictions. It is defined as the ratio of true positive predictions to the sum of true positive and false positive predictions, offering insights into the model's performance in correctly identifying positive instances. ==Definition== Precision is calculated as: :'''<big>Precision = True Positives / (True Positives + False Positives)</big>''' Th...") Tag: Visual edit
  • 21:55, 3 November 2024Accuracy (hist | edit) ‎[37 bytes]핵톤 (talk | contribs) (Redirected page to Accuracy (Data Science)) Tags: New redirect Visual edit
  • 21:55, 3 November 2024Accuracy (Data Science) (hist | edit) ‎[2,174 bytes]핵톤 (talk | contribs) (Created page with "Accuracy is a metric used in data science to measure the performance of a model, particularly in classification problems. It represents the ratio of correctly predicted instances to the total number of instances. ==Definition== Accuracy is calculated as: :'''<big>Accuracy = (True Positives + True Negatives) / (Total Number of Instances)</big>''' This metric is often used in classification problems, where the goal is to determine how well a model can predict class labels....") Tag: Visual edit
  • 21:51, 3 November 2024Gini Impurity (Data Science) (hist | edit) ‎[2,355 bytes]핵톤 (talk | contribs) (Created page with "'''Gini Impurity''' is a metric used in data science, particularly in decision tree algorithms, to measure the "impurity" or diversity of a dataset. It helps in determining how well a split at a node separates the data into distinct classes, making it essential for classification problems. ==Definition== Gini impurity calculates the probability that a randomly chosen element from a dataset will be incorrectly classified if it is randomly labeled accordi...") Tag: Visual edit
  • 21:46, 3 November 2024Entropy (Data Science) (hist | edit) ‎[5,845 bytes]핵톤 (talk | contribs) (Created page with "'''Entropy (Data Science)''' In '''Data Science''', '''Entropy''' is a measure of randomness or uncertainty in a dataset. Often used in Decision Trees and other machine learning algorithms, entropy quantifies the impurity or unpredictability of information in a set of data. In classification tasks, entropy helps determine the best way to split data to reduce uncertainty and increase homogeneity in the resulting subsets. ==How Entropy Works== Entropy, denoted as H, is ca...") Tag: Visual edit
  • 21:43, 3 November 2024Decision Tree (hist | edit) ‎[3,463 bytes]핵톤 (talk | contribs) (Created page with "'''Decision Tree''' A '''Decision Tree''' is a supervised learning algorithm used for both classification and regression tasks. It structures decisions as a tree-like model, where each internal node represents a test on a feature, each branch represents an outcome of that test, and each leaf node represents a class label or prediction. Decision Trees are highly interpretable and can work with both categorical and numerical data, making them widely applicable across vari...") Tag: Visual edit
  • 21:39, 3 November 2024Random Forest (hist | edit) ‎[3,898 bytes]핵톤 (talk | contribs) (Created page with "'''Random Forest''' is an ensemble learning method that combines multiple Decision Trees to improve classification or regression accuracy. It is designed to mitigate the limitations of single Decision Trees, such as overfitting and sensitivity to data variations, by building a "forest" of trees and aggregating their predictions. This approach often leads to greater model stability and accuracy. ==How It Works== Random Forest creates multiple Decision Trees during trainin...") Tag: Visual edit
  • 21:33, 3 November 2024Logistic Regression (hist | edit) ‎[3,911 bytes]핵톤 (talk | contribs) (Created page with "'''Logistic regression''' is a statistical and machine learning algorithm used for binary classification tasks, where the output variable is categorical and typically represents two classes (e.g., yes/no, spam/not spam, fraud/not fraud). Despite its name, Logistic Regression is a classification algorithm, not a regression algorithm, as it predicts probabilities of classes rather than continuous values. ==How It Works== Logistic Regression models the probability of a bin...") Tag: Visual edit
  • 21:29, 3 November 2024Support Vector Machine (hist | edit) ‎[4,232 bytes]핵톤 (talk | contribs) (Created page with "'''Support Vector Machine (SVM)''' is a powerful supervised machine learning algorithm used for both classification and regression tasks, though it is primarily used in classification. SVM works by finding the optimal boundary, or hyperplane, that best separates the data points of different classes. SVM is effective in high-dimensional spaces and is especially suitable for binary classification problems. ==How It Works== SVM aims to maximize the margin between data point...") Tag: Visual edit
  • 21:18, 3 November 2024Independence (Linear Regression) (hist | edit) ‎[2,421 bytes]핵톤 (talk | contribs) (Created page with "In the context of '''Linear Regression''', '''independence''' refers to the assumption that each observation in the dataset is independent of the others. This assumption is crucial for producing unbiased estimates and valid predictions. When observations are independent, it implies that the value of one observation does not influence or provide information about another observation. ==Importance of the Independence Assumption== Independence is a foundational assumption f...") Tag: Visual edit
  • 21:15, 3 November 2024Linear Regression (hist | edit) ‎[3,427 bytes]핵톤 (talk | contribs) (Created page with "'''Linear Regression''' is a fundamental regression algorithm used in machine learning and statistics to model the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the variables, which means the change in the dependent variable is proportional to the change in the independent variables. Linear Regression is commonly used for predictive analysis and trend forecasting. ==Types of Linear Regression== T...") Tag: Visual edit
  • 21:08, 3 November 2024Discrete (hist | edit) ‎[2,011 bytes]핵톤 (talk | contribs) (Created page with "In mathematics and computer science, '''discrete''' refers to distinct, separate values or entities, as opposed to continuous values. Discrete data or structures consist of isolated points or categories, often represented by integers or categorical labels. In contrast, continuous data have values that fall within a range and can take on any value within that interval. ==Examples of Discrete Data== Discrete data is commonly found in many fields and applications: *'''Count...") Tag: Visual edit
  • 21:05, 3 November 2024Classification Algorithm (hist | edit) ‎[4,578 bytes]핵톤 (talk | contribs) (Created page with "'''Classification algorithms''' are a group of machine learning methods used to categorize data into discrete classes or labels. These algorithms learn from labeled data during training and make predictions by assigning an input to one of several possible categories. Classification is widely applied in areas like image recognition, spam filtering, and medical diagnosis. ==Types of Classification Algorithms== There are various types of classification algorithms, each with...") Tag: Visual edit
  • 21:00, 3 November 2024Regression Algorithm (hist | edit) ‎[4,309 bytes]핵톤 (talk | contribs) (Created page with "'''Regression algorithms''' are a family of machine learning methods used for predicting continuous numerical values based on input features. Unlike classification, which predicts discrete classes, regression predicts outputs that can take any real number value. Regression algorithms are widely used in various fields, such as finance, economics, and environmental science, where predicting quantities (like stock prices, sales, or temperatures) is essential. ==Types of Reg...") Tag: Visual edit
  • 20:57, 3 November 2024K-Nearest Neighbor (hist | edit) ‎[3,441 bytes]핵톤 (talk | contribs) (Created page with "'''K-Nearest Neighbo'''r, often abbreviated as '''K-NN''', is a simple and intuitive classification and regression algorithm used in supervised machine learning. It classifies new data points based on the majority class among its nearest neighbors in the feature space. K-NN is a non-parametric algorithm, meaning it makes no assumptions about the underlying data distribution, making it v...") Tag: Visual edit
  • 20:46, 3 November 2024Data Science (hist | edit) ‎[3,661 bytes]핵톤 (talk | contribs) (Created page with "Data Science is an interdisciplinary field that uses scientific methods, algorithms, and systems to extract knowledge and insights from both structured and unstructured data. It combines elements of statistics, computer science, and domain expertise to analyze complex data and derive actionable conclusions. The goal of Data Science is often to make data-driven decisions, predict trends, and provide meaningful insights that can guide business and research. ==Key Component...") Tag: Visual edit
  • 20:44, 3 November 2024Bayes' Theorem (hist | edit) ‎[1,003 bytes]핵톤 (talk | contribs) (Created page with "'''Bayes' theorem''' is a fundamental principle in probability theory and statistics, which describes how to update the probability of a hypothesis based on new evidence. It provides a mathematical framework for reasoning under uncertainty and is often used in machine learning, especially in algorithms like Naive Bayes. The theorem is expressed as: P(A | B) = (P(B | A) * P(A)) / P(B) where: *'''P(A | B)''' is the posterior probability: the probability of event A occur...") Tag: Visual edit
  • 20:38, 3 November 2024Naive Bayes (hist | edit) ‎[2,165 bytes]핵톤 (talk | contribs) (Created page with "**Naive Bayes** The '''Naive Bayes''' algorithm is a probability-based classification method that calculates the likelihood of data belonging to a specific class by using conditional probabilities. As suggested by the term "naive," this algorithm assumes that each feature is independent of the others. While this assumption is often unrealistic, Naive Bayes proves to be practical and efficient in classification tasks, providing good performance on real-world data. Naive...")
  • 10:18, 2 November 2024Main Page (hist | edit) ‎[1,333 bytes]Itwiki (talk | contribs) (Created page with "This wiki is primarily in Korean and intended for Korean users. Some documents are available in English. If you search in English and are redirected to a Korean page, it means that the document is only available in Korean. However, users from English-speaking countries are also welcome to contribute freely to this wiki.") Tag: Visual edit
  • 10:16, 2 November 2024Model Interpretability (hist | edit) ‎[2,575 bytes]핵톤 (talk | contribs) (새 문서: '''Model interpretability''' is the ability to understand or explain how a model performs predictions. It indicates whether a model’s decisions can be explained to humans and how well the reasoning behind them can be communicated. == Interpretability by Model Type == The following list generally starts with models that are considered to have higher interpretability. === Linear Regression === * A simple mathematical model that assigns linear coefficients to each feature,...) Tag: Visual edit
  • 10:05, 2 November 2024모델 해석 가능성 (hist | edit) ‎[2,903 bytes]핵톤 (talk | contribs) (새 문서: Model Interpretability 모델 해석 가능성은 인공지능이나 기계 학습 모델이 예측을 어떻게 수행하는지 이해하거나 설명할 수 있는 능력을 의미한다. 이는 모델이 내리는 결정을 사람에게 설명할 수 있는지, 또 그 이유를 얼마나 잘 전달할 수 있는지를 나타낸. == 모델별 해석 가능성 == 아래 목록은 일반적으로 해석 가능성이 높은 모델을 우선적으로 작성한 내용이다....)
  • 10:03, 2 November 2024모델 해석가능성 (hist | edit) ‎[41 bytes]핵톤 (talk | contribs) (모델 해석 가능성 문서로 넘겨주기) Tag: New redirect
  • 21:52, 1 November 2024Scp (hist | edit) ‎[31 bytes]웹개발자 (talk | contribs) (리눅스 scp 문서로 넘겨주기) Tags: New redirect Visual edit
  • 21:51, 1 November 2024Tar (hist | edit) ‎[31 bytes]웹개발자 (talk | contribs) (리눅스 tar 문서로 넘겨주기) Tags: New redirect Visual edit
  • 16:10, 1 November 2024CRUD (hist | edit) ‎[1,839 bytes]핵톤 (talk | contribs) (새 문서: CRUD는 '''Create, Read, Update, Delete'''의 약자로, 데이터베이스 관리 및 애플리케이션 개발에서 필수적인 네 가지 기본 작업을 의미한다. == 구성 == # '''Create (생성)''': 데이터베이스에 새로운 데이터를 추가하는 작업이다. #* 예를 들어, 사용자 계정 생성, 주문 등록, 제품 추가 등이 포함될 수 있다. # '''Read (읽기)''': 기존 데이터를 조회하거나 보는 작업이다. #* 예를 들...) Tag: Visual edit
  • 15:23, 1 November 2024해석 가능한 모델 (hist | edit) ‎[40 bytes]핵톤 (talk | contribs) (새 문서: Interpretable Models 해석 가능한 모델은 모델이 생성하는 예측 결과에 대한 설명을 쉽게 이해할 수 있는 모델을 말합니다. 주로 특성(feature)이 결과에 미치는 영향을 명확하게 파악할 수 있는 모델들이 여기에 해당됩니다. == 모델별 해석 가능성 == 아래 목록은 일반적으로 해석 가능성이 높은 모델을 우선적으로 작성한 내용이다. 아래쪽에 있는 모델들은 해석 가능성...) Tag: Visual edit
  • 11:25, 30 October 2024의대 계약정원제 (hist | edit) ‎[4,446 bytes]172.70.114.223 (talk) (새 문서: 의대 계약정원제는 지방 의대생 일부를 지역 공공병원에서 일정 기간 이상 의무적으로 근무하는 조건으로 선발하도록 하는 제도이다. == 취지 == 이는 지방 의료 인력의 수도권 유출을 막고 의료 공백을 해소할 수 있다. 현재 의사들이 지방에선 거의 근무하지 않으려고 하기 때문에 지방의 필수 의료 인력 부족으로 주민들(주로 노령층)이 아파도 치료를 받지 못하...) Tag: Visual edit
(newest | oldest) View ( | ) (20 | 50 | 100 | 250 | 500)