Distributed Database
From CS Wiki
Revision as of 12:35, 15 December 2024 by Betripping (talk | contribs) (Created page with "'''Distributed Database''' is a collection of databases distributed across multiple physical locations that function as a single logical database. Each site can operate independently while participating in a unified database system through communication over a network. ==Key Concepts== *'''Data Distribution:''' Data is distributed across multiple sites based on factors like performance, reliability, and locality. *'''Transparency:''' Users interact with the distributed d...")
Distributed Database is a collection of databases distributed across multiple physical locations that function as a single logical database. Each site can operate independently while participating in a unified database system through communication over a network.
Key Concepts[edit | edit source]
- Data Distribution: Data is distributed across multiple sites based on factors like performance, reliability, and locality.
- Transparency: Users interact with the distributed database as if it were a single database, regardless of the underlying distribution.
- Replication: Data is duplicated across multiple sites to improve fault tolerance and availability.
- Partitioning: Data is divided into subsets, each stored at a specific location.
Characteristics[edit | edit source]
Distributed databases are defined by the following characteristics:
- Distributed Data Storage: Data is stored on multiple nodes or sites.
- Autonomy: Each node can function independently and manage its local database.
- Transparency:
- Location Transparency: Users do not need to know where data is physically stored.
- Replication Transparency: Users are unaware of data being replicated across sites.
- Fragmentation Transparency: Users do not need to know how data is partitioned.
- Scalability: The system can grow by adding more nodes.
- Fault Tolerance: Replication and redundancy provide resilience to failures.
Types of Distributed Databases[edit | edit source]
Distributed databases can be classified based on their architecture:
- Homogeneous Distributed Database:
- All nodes use the same database management system (DBMS).
- Example: A PostgreSQL cluster.
- Heterogeneous Distributed Database:
- Nodes may use different DBMSs but are integrated into a single system.
- Example: A system integrating MySQL and Oracle databases.
- Federated Database:
- Autonomous databases are integrated through a middleware layer.
- Example: A research database integrating multiple institutional datasets.
Advantages[edit | edit source]
- Improved Performance: Data is stored closer to where it is needed, reducing access time.
- Fault Tolerance: Data replication ensures system availability during node failures.
- Scalability: The system can handle growing amounts of data by adding more nodes.
- Resource Sharing: Enables sharing of hardware, software, and data resources.
Limitations[edit | edit source]
- Complexity: Managing a distributed database is more complex than a centralized one.
- Consistency: Maintaining consistency across nodes in a distributed system can be challenging.
- Communication Overhead: Data synchronization and query execution across nodes incur network overhead.
- Latency: Network delays can affect query response times.
Example: Distributed Query in a Distributed Database[edit | edit source]
Consider a distributed database with two nodes:
- Node 1 stores employee data.
- Node 2 stores department data.
Query: Retrieve the names of employees in the "Sales" department.
Steps[edit | edit source]
Step | Action | Performed On |
---|---|---|
1 | Parse query: SELECT employees.name FROM employees JOIN departments ON employees.dept_id = departments.dept_id WHERE departments.name = 'Sales'. | Query Coordinator |
2 | Decompose query into sub-queries:
| |
3 | Execute sub-queries on respective nodes:
| |
4 | Combine results and return final output. | Query Coordinator |
Data Distribution Techniques[edit | edit source]
Distributed databases use the following techniques to distribute data:
- Replication:
- Duplicates data across multiple sites.
- Improves fault tolerance and read performance but requires synchronization.
- Fragmentation:
- Divides data into fragments, stored at different sites.
- Types:
- Horizontal Fragmentation: Divides a table into rows.
- Vertical Fragmentation: Divides a table into columns.
- Hybrid Fragmentation: Combines horizontal and vertical fragmentation.
- Hybrid Distribution:
- Combines replication and fragmentation to optimize performance and fault tolerance.
Applications[edit | edit source]
Distributed databases are widely used in:
- Global Enterprises: Managing geographically dispersed data.
- Cloud Databases: Supporting distributed cloud-based platforms like Google Spanner and Amazon Aurora.
- IoT Systems: Managing data from distributed devices.
- Big Data Analytics: Processing large-scale distributed datasets.
Challenges[edit | edit source]
Distributed databases face several challenges:
- Data Consistency: Ensuring consistency across replicas while maintaining performance.
- Network Partitioning: Handling situations where communication between nodes is disrupted.
- Query Optimization: Efficiently executing queries across distributed nodes.
- Security: Securing data transmission and storage across multiple locations.