Month 4 Databases
Database Management Systems Comparison
This document provides a detailed comparison between some of the most popular database management systems: MongoDB, Couchbase, PostgreSQL, MSSQL, MySQL, and MariaDB. Below you will find detailed descriptions and a comparison table.
MongoDB
MongoDB is a NoSQL database that stores data in a flexible, JSON-like document format. It is designed for scalability and performance, making it an excellent choice for applications that require high availability and horizontal scaling. MongoDB uses its own MongoDB Query Language (MQL), which is similar to SQL but optimized for querying JSON documents.
- Data Model: Document (JSON-like)
- Query Language: MongoDB Query Language (MQL)
- Scalability: High, horizontal scaling
- Transactions: Limited support
- Advanced Features: Sharding, replication
- Use Cases: Big Data, Real-time analytics
- License: Server Side Public License
- Commercial Support: Available
Couchbase
Couchbase is a NoSQL database that combines the flexibility and scalability of a document database with the speed and reliability of a key-value store. It is designed for high performance and high availability, making it suitable for applications that require real-time data processing. Couchbase supports SQL-like querying through its N1QL query language, allowing for complex queries on JSON documents.
- Data Model: Document + Key-value pairs
- Query Language: N1QL (SQL-like)
- Scalability: High, horizontal scaling
- Transactions: Limited support
- Advanced Features: Full-text search, replication
- Use Cases: Real-time applications, Mobile Apps
- License: Apache 2.0
- Commercial Support: Available
PostgreSQL
PostgreSQL is a powerful, open-source relational database management system that stores data in tables with rows and columns. It is known for its advanced features such as support for complex data types, transactions, and full-text search. PostgreSQL uses standard SQL queries and is a popular choice for a wide range of applications due to its robustness and large community of users and contributors.
- Data Model: Tables with rows & columns
- Query Language: SQL
- Scalability: Moderate, vertical scaling
- Transactions: Full support
- Advanced Features: Complex data types, full-text search, JSON support
- Use Cases: Web apps, Data warehousing
- License: PostgreSQL License
- Commercial Support: Available
MSSQL (Microsoft SQL Server)
Microsoft SQL Server (MSSQL) is a relational database management system developed by Microsoft. It is known for its enterprise features, including support for high availability, security, and advanced business intelligence tools. MSSQL uses SQL queries and is widely used in enterprise environments for mission-critical applications.
- Data Model: Tables with rows & columns
- Query Language: SQL
- Scalability: Moderate, vertical scaling
- Transactions: Full support
- Advanced Features: Business intelligence tools, OLTP, OLAP
- Use Cases: Enterprise applications, Data Warehousing
- License: Proprietary (Microsoft EULA)
- Commercial Support: Microsoft Support
MySQL
MySQL is an open-source relational database management system widely used for web applications. It is known for its speed, reliability, and ease of use, making it a popular choice for small to medium-sized applications. MySQL supports SQL queries and has a large community of users and contributors, which makes it versatile and flexible for a wide range of applications.
- Data Model: Tables with rows & columns
- Query Language: SQL
- Scalability: Moderate, vertical scaling
- Transactions: Full support
- Advanced Features: Replication, partitioning
- Use Cases: Web apps, CMS systems, eCommerce
- License: GNU GPL
- Commercial Support: Available
MariaDB
MariaDB is a community-developed fork of MySQL and is designed to be highly compatible with MySQL. It aims to maintain MySQL’s speed, reliability, and ease of use while providing enhanced features, performance improvements, and a more open development process. Like MySQL, MariaDB uses SQL queries and is used for various applications, from small web apps to large enterprise systems.
- Data Model: Tables with rows & columns
- Query Language: SQL
- Scalability: Moderate, vertical scaling
- Transactions: Full support
- Advanced Features: Enhanced features, better performance tuning, replication
- Use Cases: Web apps, Data warehousing, CMS systems
- License: GNU GPL
- Commercial Support: Available
Comparison Table
| Feature/Aspect | MongoDB | Couchbase | PostgreSQL | MSSQL | MySQL | MariaDB |
|---|---|---|---|---|---|---|
| Type | NoSQL, Document Store | NoSQL, Document Store + Key-Value | Relational Database | Relational Database | Relational Database | Relational Database |
| Data Model | JSON-like documents | JSON documents + Key-value pairs | Tables with rows & columns | Tables with rows & columns | Tables with rows & columns | Tables with rows & columns |
| Query Language | MongoDB Query Language (MQL) | N1QL (SQL-like) | SQL | SQL | SQL | SQL |
| Scalability | High, horizontal scaling | High, horizontal scaling | Moderate, vertical scaling | Moderate, vertical scaling | Moderate, vertical scaling | Moderate, vertical scaling |
| Performance | High for unstructured data | High for real-time applications | High | High | High | High |
| Transactions | Limited support | Limited support | Full support | Full support | Full support | Full support |
| Advanced Features | Sharding, replication | Full-text search, replication | Complex data types, full-text search, JSON support | Business intelligence tools, OLTP, OLAP | Replication, partitioning | Enhanced features, better performance tuning, replication |
| Use Cases | Big Data, Real-time analytics | Real-time applications, Mobile Apps | Web apps, Data warehousing | Enterprise applications, Data Warehousing | Web apps, CMS systems, eCommerce | Web apps, Data warehousing, CMS systems |
| License | Server Side Public License | Apache 2.0 | PostgreSQL License | Proprietary (Microsoft EULA) | GNU GPL | GNU GPL |
Chapter 2:
What is a Database?
Introduction
We will answer the question: “What is a database?”
Definition
A database is a collection of organized data that can be easily accessed, stored electronically, and can contain anything from text to pictures and video.
Database Management System (DBMS)
Typically, when referencing a database in IT, it’s about a Database Management System (DBMS). A DBMS is a collection of programs that allows users to access, control, enter, manipulate, and display data within the database. End users and applications interact with the DBMS to manage the database data.
Types of Database Management Systems
There are four main types of DBMS:
- Hierarchical:
- Uses a tree-like parent-child model.
- Each child record has one parent, while a parent can have multiple children.
- Network:
- Similar to hierarchical but allows each node to have relationships with multiple entities.
- Uses terms like “member” for child, and “occupier” for parent.
- Relational:
- Stores data in tables with columns and rows.
- Columns define data attributes and types; rows represent records.
- Keys uniquely identify rows and link data across tables.
- Object-Oriented:
- Represents data as objects and classes, similar to object-oriented programming.
- Objects are real-world entities and classes are collections of objects.
Popular Database Management Systems
Some popular DBMS examples include:
- MySQL: The focus of this course.
- MariaDB: A fork of MySQL started by its original developers.
- Oracle Database: Popular in the enterprise space.
- Microsoft SQL Server: Commonly used on Windows, now also supports Linux.
- PostgreSQL: An open-source, vendor-neutral DBMS with significant community contributions.
Although each database system stores data differently, the data can often be exported to be used in different DBMSs.
Chapter 3
Google Cloud SQL
Google Cloud SQL is a fully-managed relational database service for MySQL, PostgreSQL, and SQL Server. It simplifies database management tasks like patching, backups, and monitoring.
Key Features:
- Managed Service: Automatic updates, patch management, and backups.
- Scalability: Vertical scaling to accommodate varying workloads.
- High Availability: Built-in replication and failover within a region.
- Security: Data encryption at rest and in transit; integration with Google Cloud IAM.
- Compatibility: Supports MySQL, PostgreSQL, and SQL Server.
- Integration: Works seamlessly with other Google Cloud services and provides APIs for custom development.
Use Cases: Ideal for applications requiring traditional relational databases, transactional and analytical workloads, and high compatibility with MySQL, PostgreSQL, or SQL Server.
Google Cloud Spanner
Google Cloud Spanner is a fully-managed, horizontally scalable, globally distributed relational database service. It combines the benefits of SQL with the scalability of NoSQL databases.
Key Features:
- Global Distribution: Data stored across multiple regions worldwide with strong consistency.
- Scalability: Horizontally scalable for massive workloads.
- ACID Transactions: Supports distributed transactions with ACID properties.
- SQL Compatibility: Uses ANSI SQL for easier adoption.
- High Performance: Designed for high throughput and low latency.
- Strong Consistency: Guarantees strong consistency across all regions and nodes.
- Google Cloud Integration: Integrated with other Google Cloud services.
Use Cases: Suitable for applications requiring global scalability, complex transactions, and high availability, such as financial systems and global retail platforms.
Differences
| Feature | Google Cloud SQL | Google Cloud Spanner |
|---|---|---|
| Scalability | Vertical scaling | Horizontal scaling across nodes and regions |
| Architecture | Single-region deployment with regional replication | Multi-region distribution with strong consistency |
| Use Cases | Traditional relational database needs | Applications needing global scalability and consistency |
| Consistency | Strong consistency within a single instance | Strong global consistency |
| High Availability | Regional replication and failover | Multi-region replication and failover |
| Performance | Optimized for moderate workloads | Optimized for high throughput, large-scale workloads |
| SQL Compatibility | MySQL, PostgreSQL, SQL Server | ANSI SQL compatibility |
| Management | Managed service with automatic maintenance | Fully managed with advanced configuration options |
In summary:
- Google Cloud SQL is best for applications needing a traditional relational database with MySQL, PostgreSQL, or SQL Server support, focusing on regional workloads.
- Google Cloud Spanner is ideal for applications that demand high scalability, global distribution, and strong consistency, making it suitable for critical, large-scale transactional systems.
Google Cloud Bigtable
Google Cloud Bigtable is a fully-managed, scalable NoSQL database service designed for large analytical and operational workloads. It is ideal for applications requiring high throughput and low latency.
Key Features:
- Scalability: Scales to petabytes of data with automatic sharding and load balancing.
- Performance: Delivers high throughput and low latency for both reads and writes.
- Integration: Integrates seamlessly with Hadoop, Dataflow, and other big data tools.
- Schema Design: Optimized for wide-column data model, similar to HBase.
Use Cases: Ideal for time-series data, IoT data, user analytics, ad tech, and financial data analysis.
Relational Database Management Systems (RDBMS)
Introduction
What is an RDBMS?
An RDBMS is a software system used to interact with and maintain a relational database. The terms relational database, relational database system, and RDBMS are often used interchangeably.
Key Characteristics
- Structured Format:
- Stores data in tables with columns and rows.
- Columns define specific values allowed.
- Rows must comply with column definitions.
- Table Relationships:
- Tables can be linked via shared columns.
- Enables complex queries across multiple tables using the JOIN clause in SQL statements.
- SQL Implementation:
- SQL (Structured Query Language) is used for querying and managing data.
- Became an official standard in the 1980s.
- SQL knowledge facilitates transitioning between different database systems.
- Stored Procedures:
- Support the use of stored procedures to execute multiple SQL statements.
- Can restrict direct table access, requiring interaction through stored procedures.
Keys
- Primary Key:
- Uniquely identifies table rows.
- Values must be unique and not null.
- Often uses the integer data type, set to auto-increment for efficiency.
- Foreign Key:
- Creates links between tables.
- References primary keys or other unique columns in another table.
- Establishes referential relationships.
Conclusion
Most database management systems today are relational due to their structured and logical data organization, allowing for efficient and complex querying.
Google Cloud Storage
Google Cloud Storage is a scalable object storage service designed for storing and retrieving any amount of data at any time.
Key Features:
- Durability: Designed for 99.999999999% (11 9s) durability.
- Scalability: Scales to exabytes of data.
- Cost-Effective: Multiple storage classes for cost optimization based on access frequency.
- Integration: Integrates with other Google Cloud services and third-party tools.
Use Cases: Suitable for storing unstructured data like images, videos, backups, and large datasets for analytics.
When to Use NoSQL vs. Relational Databases
NoSQL Databases:
- Flexibility: Schema-less design allows for semi-structured or unstructured data.
- Scalability: Horizontal scaling capabilities for high throughput and large datasets.
- Use Cases: Ideal for real-time analytics, IoT, social media, big data applications, and applications requiring flexible data models.
Relational Databases:
- Consistency: Strong ACID transaction support ensures data integrity.
- Structured Data: Requires fixed schema and structured data.
- Use Cases: Suitable for applications needing complex queries, transactions, and strong consistency, like financial systems, enterprise applications, and CRM systems.
Google Firebase Realtime Database
Firebase Realtime Database is a NoSQL cloud database that stores and syncs data between users in real-time.
Key Features:
- Real-Time Sync: Data is synchronized in real-time across all connected clients.
- Offline Support: Data is available offline and syncs when the device reconnects.
- Security: Built-in security rules for data validation and access control.
Use Cases: Ideal for chat applications, real-time collaboration tools, and live data updates in mobile and web apps.
Google Cloud Firestore
Google Cloud Firestore is a NoSQL document database that offers real-time synchronization and supports complex querying.
Key Features:
- Real-Time Sync: Synchronizes data in real-time across clients.
- Complex Queries: Supports complex queries and indexing.
- Scalability: Scales automatically to handle massive loads.
- Offline Support: Data is accessible offline and syncs when reconnected.
- Integration: Integrates seamlessly with Firebase and other Google Cloud services.
Use Cases: Suitable for scalable mobile, web, and server applications requiring structured data, real-time updates, and offline capabilities.
Google Cloud Memorystore
Google Cloud Memorystore is a fully-managed in-memory data store service for Redis and Memcached, offering high performance and low latency.
Key Features:
- Managed Service: Handles patching, monitoring, and failover.
- Performance: Designed for high throughput and low-latency data access.
- Scalability: Scales to meet application demands.
- Security: Supports VPC network connectivity for secure data access.
Use Cases: Ideal for caching, session management, real-time analytics, and fast data retrieval.
Summary Table
| Service | Description | Use Cases |
|---|---|---|
| Google Cloud SQL | Managed relational DB for MySQL, PostgreSQL, SQL Server | Traditional relational DB needs, moderate workloads |
| Google Cloud Spanner | Managed, scalable, globally-distributed relational DB | Large-scale apps requiring global consistency |
| Google Bigtable | Managed, scalable NoSQL wide-column store | Time-series data, IoT, user analytics |
| Google Cloud Storage | Scalable object storage | Unstructured data, large datasets, backups |
| Firebase Realtime DB | NoSQL DB with real-time sync | Chat apps, real-time collaboration tools |
| Google Cloud Firestore | NoSQL document DB with real-time sync and complex queries | Scalable mobile/web apps, real-time updates |
| Google Cloud Memorystore | Managed in-memory data store for Redis/Memcached | Caching, session management, fast data retrieval |
NoSQL vs. Relational Database Use Cases
- NoSQL Databases: Flexible schema, scalable, real-time analytics, IoT, social media, big data applications.
- Relational Databases: Strong consistency, structured data, complex queries, transactions, enterprise applications, financial systems.