New pages
From CS Wiki
- 23:18, 1 December 2024 Observational Machine Learning Method (hist | edit) [4,378 bytes] Dendrogram (talk | contribs) (새 문서: '''Observational Machine Learning Methods''' are techniques designed to analyze data collected from observational studies rather than controlled experiments. In such studies, the assignment of treatments or interventions is not randomized, which can introduce biases and confounding factors. Observational ML methods aim to identify patterns, relationships, and causal effects within these datasets. ==Key Challenges in Observational Data== Observational data often comes with inhere...) Tag: Visual edit
- 23:13, 1 December 2024 Propensity Score Matching (hist | edit) [4,262 bytes] Dendrogram (talk | contribs) (새 문서: '''Propensity Score Matching (PSM)''' is a statistical technique used in observational studies to reduce selection bias when estimating the causal effect of a treatment or intervention. It involves pairing treated and untreated units with similar propensity scores, which represent the probability of receiving the treatment based on observed covariates. ==Key Concepts== *'''Propensity Score:''' The probability of a unit receiving the treatment, given its covariates. *'''Matching:...) Tag: Visual edit
- 23:13, 1 December 2024 Causal Graph (hist | edit) [3,695 bytes] Dendrogram (talk | contribs) (새 문서: '''Causal Graph''' is a directed graph used to represent causal relationships between variables in a dataset. Each node in the graph represents a variable, and directed edges (arrows) indicate causal influence from one variable to another. Causal graphs are widely used in causal inference, machine learning, and decision-making processes. ==Key Components of a Causal Graph== A causal graph typically consists of the following: *'''Nodes:''' Represent variables in the system (e.g.,...) Tag: Visual edit
- 23:12, 1 December 2024 Data Science Contents (hist | edit) [1,947 bytes] Dendrogram (talk | contribs) (새 문서: === 1. Understanding Data Science === * What is Data Science? * Impact on Business * Key Technologies in Data Science === 2. Data Preparation and Preprocessing === * Data Collection * Handling '''Missing Data''' and '''Outlier'''s * Normalization and Standardization === 3. Exploratory Data Analysis (EDA) === * Goals of Data Analysis * Basic Statistical Analysis * Importance of Data Visualization === 4. Supervised Learning === *...) Tag: Visual edit
- 23:11, 1 December 2024 Outlier (Data Science) (hist | edit) [4,314 bytes] Dendrogram (talk | contribs) (새 문서: '''Outlier''' refers to a data point that significantly deviates from other observations in a dataset. Outliers can arise due to variability in the data, errors in measurement, or rare events. Identifying and addressing outliers is critical in data preprocessing, as they can influence statistical analyses and machine learning models. ==Characteristics of Outliers== Outliers exhibit the following traits: *'''Deviation from Patterns:''' They do not conform to the general distribut...) Tag: Visual edit
- 16:21, 1 December 2024 Principal Component Analysis (hist | edit) [3,829 bytes] Dendrogram (talk | contribs) (새 문서: '''Principal Component Analysis (PCA)''' is a statistical technique used for dimensionality reduction by transforming a dataset into a new coordinate system. The transformation emphasizes the directions (principal components) that maximize the variance in the data, helping to reduce the number of features while preserving essential information. ==Key Concepts== *'''Principal Components:''' New orthogonal axes computed as linear combinations of the original features. The first pr...) Tag: Visual edit
- 16:19, 1 December 2024 Singular Value Decomposition (hist | edit) [2,936 bytes] Dendrogram (talk | contribs) (새 문서: '''Singular Value Decomposition (SVD)''' is a mathematical technique used to decompose a matrix into three component matrices. It is widely used in data analysis, dimensionality reduction, machine learning, and signal processing. ==Definition== SVD decomposes a matrix \( A \) into three matrices: *'''U:''' An orthogonal matrix containing the left singular vectors. *'''Σ (Sigma):''' A diagonal matrix with singular values sorted in descending order. *'''V^T:''' An orthogonal matr...) Tag: Visual edit
- 16:13, 1 December 2024 Ontology (hist | edit) [3,040 bytes] Dendrogram (talk | contribs) (새 문서: '''Ontology''' in computer science and information science refers to a formal representation of knowledge within a specific domain. It defines concepts, relationships, and categories to facilitate reasoning, data integration, and knowledge sharing. ==Key Components of an Ontology== An ontology typically consists of the following elements: *'''Classes (Concepts):''' Represent the entities or objects in the domain. *'''Relationships:''' Define how classes are connected (e.g., "is-...) Tag: Visual edit
- 16:09, 1 December 2024 Dimensionality Reduction (hist | edit) [3,754 bytes] Dendrogram (talk | contribs) (새 문서: '''Dimensionality Reduction''' is a technique used in machine learning and data analysis to reduce the number of features (dimensions) in a dataset while preserving as much relevant information as possible. It simplifies data visualization, reduces computational costs, and helps mitigate the curse of dimensionality. ==Importance of Dimensionality Reduction== Dimensionality reduction is crucial for the following reasons: *'''Improves Model Performance:''' Reducing irrelevant or r...) Tag: Visual edit
- 16:05, 1 December 2024 Hash Function (hist | edit) [3,703 bytes] Dendrogram (talk | contribs) (새 문서: '''Hash Function''' is a mathematical function that transforms input data of arbitrary size into a fixed-length output, called a hash or digest. Hash functions are widely used in computer science, cryptography, and data management for tasks like data integrity, indexing, and secure storage. ==Characteristics of a Hash Function== A good hash function typically satisfies the following properties: *'''Deterministic:''' The same input always produces the same hash. *'''Fast Computat...) Tag: Visual edit
- 15:44, 1 December 2024 Dendrogram (hist | edit) [3,081 bytes] Dendrogram (talk | contribs) (새 문서: '''Dendrogram''' is a tree-like diagram used to represent the hierarchical relationships among a set of data points. It is commonly used in hierarchical clustering to visualize the order and structure of clusters as they are merged or divided. The height of each branch in a dendrogram indicates the distance or dissimilarity between clusters. ==Structure of a Dendrogram== A dendrogram consists of the following components: *'''Leaves:''' Represent individual data points or initial...) Tag: Visual edit
- 15:43, 1 December 2024 Hierarchical Clustering (hist | edit) [3,367 bytes] Dendrogram (talk | contribs) (새 문서: '''Hierarchical Clustering''' is a clustering method in machine learning and statistics that builds a hierarchy of clusters by either merging smaller clusters into larger ones (agglomerative) or dividing larger clusters into smaller ones (divisive). It is widely used for exploratory data analysis and in domains such as bioinformatics, marketing, and social network analysis. ==Types of Hierarchical Clustering== Hierarchical clustering is divided into two main types: *'''Agglomera...) Tag: Visual edit
- 15:40, 1 December 2024 K-Means++ (hist | edit) [2,884 bytes] Dendrogram (talk | contribs) (새 문서: '''K-Means++''' is an enhanced initialization algorithm for the K-Means clustering method. It aims to improve the selection of initial cluster centroids, which is a critical step in the K-Means algorithm. By carefully choosing starting centroids, K-Means++ reduces the chances of poor clustering outcomes and accelerates convergence. ==How K-Means++ Works== K-Means++ modifies the standard K-Means initialization by ensuring that the initial centroids are chosen in a way that they a...) Tag: Visual edit
- 15:02, 1 December 2024 K-Means (hist | edit) [3,918 bytes] Dendrogram (talk | contribs) (새 문서: '''K-Means''' is one of the most popular unsupervised machine learning algorithms used for clustering data into distinct groups. The algorithm partitions a dataset into '''k''' clusters, where each data point belongs to the cluster with the nearest mean. It is widely used for data analysis, pattern recognition, and feature engineering. ==How K-Means Works== The K-Means algorithm follows an iterative process to assign data points to clusters and optimize the cluster centroids: #I...) Tag: Visual edit
- 23:38, 30 November 2024 Holdout (Data Science) (hist | edit) [3,203 bytes] Fortify (talk | contribs) (새 문서: '''Holdout''' in data science refers to a method used to evaluate the performance of machine learning models by splitting the dataset into separate parts, typically a training set and a testing set. The testing set, often called the "holdout set," is kept aside during model training and is only used for final evaluation to ensure unbiased performance metrics. ==How Holdout Works== The holdout method involves the following steps: *The dataset is split into two (or sometimes three...) Tag: Visual edit
- 23:22, 30 November 2024 PHP-FPM pm.max children (hist | edit) [3,975 bytes] Fortify (talk | contribs) (새 문서: '''pm.max_children''' is a directive in PHP-FPM (FastCGI Process Manager) configuration that specifies the maximum number of child processes that can be created to handle incoming requests. This setting is critical for managing server resources and ensuring that PHP-FPM can efficiently handle concurrent traffic without overloading the server. ==Overview== *The `pm.max_children` directive determines the upper limit on the number of simultaneous PHP-FPM worker processes. *When the...) Tag: Visual edit
- 23:13, 30 November 2024 Apache FollowSymLinks (hist | edit) [3,193 bytes] Fortify (talk | contribs) (새 문서: '''FollowSymLinks''' is a directive in the Apache HTTP Server configuration that controls whether symbolic links (symlinks) in the server's document root or other directories can be followed. Symbolic links are files that point to other files or directories. The FollowSymLinks directive is often used to manage access and behavior related to these links in a web server environment. ==Syntax== The directive is used within Apache configuration files (e.g., `httpd.conf` or `.htacces...) Tag: Visual edit
- 21:12, 30 November 2024 Diaper-Beer Syndrome (hist | edit) [2,776 bytes] Fortify (talk | contribs) (새 문서: '''Diaper-Beer Syndrome''' refers to a popular anecdote in data mining that suggests a correlation between the sales of diapers and beer. According to the story, data analysis at a retail store revealed that young fathers often purchased diapers and beer together, especially on Friday evenings. Although this example is frequently cited to demonstrate the potential of data mining, its authenticity remains doubtful. == The Legend == The legend goes as follows: * Retail analysts d...)
- 19:20, 30 November 2024 Leakage (Data Science) (hist | edit) [5,267 bytes] Prairie (talk | contribs) (새 문서: '''Leakage''' in data science refers to a situation where information from outside the training dataset is inappropriately used to build or evaluate a model. This results in overoptimistic performance metrics during model evaluation, as the model effectively "cheats" by having access to information it would not have in a real-world application. Leakage is a critical issue in machine learning workflows and can lead to misleading conclusions and poor model generalization. ==Types...) Tag: Visual edit
- 19:02, 30 November 2024 Ensemble Learning (hist | edit) [4,866 bytes] Prairie (talk | contribs) (Created page with "'''Ensemble Learning''' is a machine learning technique that combines multiple models, often called "base learners," to create a more powerful predictive model. By aggregating the predictions of several models, ensemble methods improve accuracy, reduce variance, and mitigate overfitting. Ensemble learning is widely used in classification, regression, and anomaly detection tasks. ==Overview== Ensemble learning leverages the idea that combining multiple models can outperfo...") Tag: Visual edit
- 18:57, 30 November 2024 Boosting (hist | edit) [4,388 bytes] Prairie (talk | contribs) (Created page with "'''Boosting''' is an ensemble learning technique in machine learning that focuses on improving the performance of weak learners (models that perform slightly better than random guessing) by sequentially training them on the mistakes made by previous models. Boosting reduces bias and variance, making it effective for building accurate and robust predictive models. ==Overview== The key idea behind boosting is to combine multiple weak learners into a single strong learner....") Tag: Visual edit
- 14:57, 30 November 2024 Sidebar Korean (hist | edit) [832 bytes] Itwiki (talk | contribs) (새 문서: * 분류별 보기 ** :분류:일반 IT용어|일반 IT용어 ** :분류:프로젝트 관리|프로젝트 관리 ** :분류:디지털 서비스|디지털 서비스 ** :분류:블록체인|블록체인 ** :분류:인공지능|인공지능 ** :분류:소프트웨어 공학|소프트웨어 공학 ** :분류:운영체제|운영체제 ** :분류:컴퓨터 구조|컴퓨터 구조 ** :분류:자료 구조|자료 구조 ** :분류:데이터 과학|데이터 과학 ** :분류:데이터...)
- 14:57, 30 November 2024 Sidebar English (hist | edit) [78 bytes] Itwiki (talk | contribs) (새 문서: * Category ** :Category:Network|Network ** :Category:Data Science|Data Science) Tag: Visual edit: Switched
- 13:53, 30 November 2024 Bootstrap Aggregating (hist | edit) [4,422 bytes] Prairie (talk | contribs) (Created page with "'''Bootstrap Aggregating''', commonly known as '''Bagging''', is an ensemble learning method designed to improve the stability and accuracy of machine learning models. It works by combining the predictions of multiple base models, each trained on different subsets of the data created through the bootstrap sampling technique. Bagging reduces variance, mitigates overfitting, and improves model robustness. == Overview == Bootstrap aggregating is built on two fundamental co...")
- 00:19, 30 November 2024 Min-Max Scaling (hist | edit) [3,838 bytes] Fortify (talk | contribs) (Created page with "'''Min-Max Scaling''' is a data normalization technique used to scale features to a fixed range, typically [0, 1]. It ensures that all features contribute equally to the analysis or model by transforming the original values proportionally to fit within the specified range. Min-Max Scaling is widely used in data preprocessing for machine learning and statistical analysis. ==Overview== Min-Max Scaling transforms the data linearly by rescaling each value based on the featur...") Tag: Visual edit
- 21:48, 29 November 2024 PHP-FPM Dynamic Process Management (hist | edit) [3,693 bytes] Fortify (talk | contribs) (Created page with "'''PHP-FPM Dynamic Process Management''' refers to one of the modes available in PHP-FPM (FastCGI Process Manager) to manage worker processes. In this mode, the number of worker processes dynamically adjusts based on server load, ensuring efficient use of system resources while maintaining the ability to handle varying traffic levels. ==Overview== PHP-FPM is a robust process manager for PHP, and its dynamic mode is designed to strike a balance between performance and res...") Tag: Visual edit
- 21:45, 29 November 2024 PHP-FPM pm.max spare servers (hist | edit) [2,705 bytes] Fortify (talk | contribs) (Created page with "'''pm.max_spare_servers''' is a configuration directive in PHP-FPM (FastCGI Process Manager) that specifies the maximum number of idle (spare) worker processes to maintain in the pool. It is used when the process manager (pm) is set to '''dynamic'''. This directive ensures that server resources are not wasted by limiting the number of idle worker processes. ==Overview== In dynamic process management mode, PHP-FPM adjusts the number of worker processes based on the server...") Tag: Visual edit
- 21:44, 29 November 2024 PHP-FPM pm.min spare servers (hist | edit) [2,705 bytes] Fortify (talk | contribs) (Created page with "'''pm.min_spare_servers''' is a configuration directive in PHP-FPM (FastCGI Process Manager) used to specify the minimum number of idle (spare) worker processes that should be maintained in the pool. It is applicable when the process manager (pm) is set to '''dynamic'''. This directive ensures that there are always enough idle workers available to handle incoming requests without unnecessary delays. ==Overview== When PHP-FPM is configured to use the '''dynamic''' process...") Tag: Visual edit
- 21:28, 29 November 2024 Time Series Data (hist | edit) [3,841 bytes] Fortify (talk | contribs) (Created page with "'''Time Series Data''' refers to a sequence of data points collected or recorded at successive, evenly spaced points in time. This type of data is used to track changes over time and is a critical component in various fields like finance, economics, environmental science, and machine learning. ==Overview== Time series data captures how a variable evolves over time. The primary characteristic of time series data is its temporal ordering, meaning that the order of the obse...") Tag: Visual edit