Introduction
There are 3 critical differences between Bigtable and BigQuery:
- Bigtable is an OLTP (online transactional processing) system. BigQuery is a business intelligence/OLAP (online analytical processing) system.
- Bigtable is a NoSQL database service. BigQuery is more of a hybrid; it uses SQL dialects and is based on Google’s internal column-based data processing technology, "Dremel.”
- Bigtable is mutable and has a fast key-based lookup. BigQuery is immutable and has a slow key-based lookup.
Organizations are accumulating greater and greater amounts of information each year, causing the global “big data” sphere to increase exponentially. The International Data Corporation (IDC) estimates that “big data” will reach 175 zettabytes (175 trillion gigabytes) by 2025.
Due to this exponential growth, engineers have built cloud storage systems that are highly scalable, highly reliable, highly available, low-cost, self-healing, and decentralized. There are two main types of cloud storage: distributed file systems and distributed databases.
-
A distributed file system is distributed across multiple file servers or multiple locations. It allows users of physically distributed systems to share their data and resources by using a common file system. Clients can access and process data stored on the system as if it were on their machine.
-
A distributed database is a group of multiple logically related databases distributed over a computer network.
Google has been a leader when it comes to cloud storage of big data. In the early 2000s, Google developed the Google File System to meet the growing processing demands they encountered when storing and analyzing vast numbers of web pages (indexing web content). GFS essentially provides efficient, reliable access to data using large clusters of commodity hardware. The MapReduce paper followed in 2004—outlining a distributed computing and analysis model for processing massive data sets with a parallel, distributed algorithm on a cluster.
To mitigate the challenges associated with a large amount of formatted and semi-formatted data, the large-scale database system, Bigtable, emerged from the Google forge, built on top of MapReduce and GFS. BigQuery, a cloud-based query service for very large datasets, soon followed, built using Bigtable, Google Cloud Platform, and Google’s Dremel system for ad hoc queries.
Although Bigtable and BigQuery are both Google products for storing large-scale datasets, the two solutions are also quite different. In this article, we’ll give an overview of Bigtable and BigQuery as well as the most important differences between Bigtable and BigQuery, so that you can make the choice that’s right for your organization.
What is Google Cloud Bigtable?
Bigtable is a petabyte-scale, fully managed NoSQL database service, also known as a “NoSQL Database as a Service”. Bigtable supports weak consistency and can index, query, and analyze massive amounts of data.
In particular, Bigtable is ideal for storing vast quantities of single-keyed data with low latency and can support high read and write throughput at low latency. This makes it a perfect data source for MapReduce operations. Google products such as Analytics, Finance, Personalized Search, Earth, Orkut, and Writely (a predecessor to Google Docs) have all made use of Bigtable for their day-to-day operations, serving millions of Google customers.
What is Google Cloud BigQuery?
BigQuery is a powerful business intelligence tool that can be described as a “Big Data as a Service” solution. BigQuery is serverless, fully managed SQL data warehouse, which makes it capable of rapid SQL queries and interactive analysis of massive datasets (on the order of terabytes or petabytes). BigQuery is the equivalent of other data warehouse solutions from major public cloud providers, such as Amazon Web Services’ Redshift or Microsoft’s Azure SQL Data Warehouse.
Related Reading: Redshift vs. BigQuery
The BigQuery platform utilizes a columnar storage paradigm that allows for much faster data scanning, plus a tree architecture model that makes querying and aggregating results significantly more manageable and efficient. According to Google, BigQuery’s lightning-fast data processing can scan 35 billion rows in a matter of seconds. BigQuery provides the capability to integrate with the Apache big data ecosystem, and existing Hadoop/Spark and Beam workloads can also read or write data directly from BigQuery.
BigQuery is the external implementation of one of the company's core technologies, code-named Dremel (2006); Dremel and BigQuery share the same foundational architecture. Dremel is essentially a query execution engine that can independently scale compute nodes to mitigate computationally intensive queries. However, BigQuery leverages a myriad of other tools as well: Borg, Colossus (the successor of Google File System), Capacitor, and Jupiter, just to name a few.
If you want to offload data processing workloads using BigQuery, check out this tutorial. Or, for more information on Integrate.io's native Google BigQuery connector, visit our Integration page.
The Unified Stack for Modern Data Teams
Get a personalized platform demo & 30-minute Q&A session with a Solution Engineer
Bigtable vs. BigQuery: OLTP or OLAP?
One of the most important distinctions to make when comparing database systems is the question of OLTP vs. OLAP:
-
OLTP (online transactional processing) systems work with transactional data, support both read and write operations and are built for speed. This makes them best suited for recording day-to-day business activities. For example, a bank might use an OLTP system to keep track of users’ history of deposits, withdrawals, and transfers in their bank accounts.
-
OLAP (online analytical processing) systems work with aggregated historical data, support only read operations, and are built for fast retrieval of answers to user queries. OLAP systems can store orders of magnitude more data than OLTP systems. OLTP databases typically range up to 10 gigabytes, whereas OLAP systems can potentially store petabytes of information. The main use case for OLAP systems is business intelligence and analytics.
Related Reading: OLTP vs. OLAP
BigQuery is an OLAP system. Query latency is slow, and so the best use case for BigQuery is running queries with heavy workloads, such as traditional OLAP reporting and archiving jobs. The design of BigQuery does not encourage OLTP-style queries. To put this into context, small read-write operations cost roughly 1.8 seconds in BigQuery, while Bigtable costs roughly only 9 milliseconds for the same operation.
BigQuery supports atomic single-row operations but does not provide cross-row transaction support. Of course, the immutable nature of BigQuery tables means that queries are executed very efficiently in parallel. This makes it possible to execute reporting and OLAP-style queries against enormous datasets by running the operation on a countless number of nodes in parallel.
The fast read-by-key and update operations make Bigtable most suitable for OLTP workloads. The data model stores information within tables and rows have columns (Type Array or Struct). It is possible to add a column to a row; the structure is similar to a persistent map. Rows have a primary key that is unique for each record, making it possible to quickly read and update a record.
It is possible to perform reporting/OLAP workloads in Bigtable since Bigtable provides efficient support for key-range-iteration. However, if interactive querying in an online analytical processing setup is of prime concern, use BigQuery instead.
Bigtable vs. BigQuery: NoSQL or SQL?
Beyond OLTP/OLAP, perhaps the most crucial dividing line of them all when it comes to databases is SQL vs. NoSQL:
-
SQL databases use a relational schema, organizing data in tables, rows, and columns. Data in SQL databases are defined and manipulated using SQL (Structured Query Language). SQL databases typically emphasize vertical scalability, i.e. you can increase the load on a single server by adding resources such as CPU, memory, or disk space.
-
NoSQL databases use a non-relational schema (either document, graphs, key-value, or columnar), and can work with both structured and unstructured data. NoSQL databases typically emphasize horizontal scalability, i.e. you can increase the load with sharding or deploying multiple servers.
Related Reading: SQL vs. NoSQL: 5 Critical Differences
Bigtable is essentially a NoSQL database service; it is not a relational database and does not support SQL or multi-row transactions. This fact makes Bigtable unsuitable for a wide range of applications; it is only a suitable solution for mutable data sets with a minimum data size of one terabyte—with anything less, the overhead is too high.
The main characteristics of Bigtable are that it can scale horizontally (giving it very high read/write throughput as a result) and its key-columns—i.e. one key can have multiple columns, which can be updated. Performance suffers when storing individual data elements more extensive than 10 megabytes. If you need a comprehensive storage solution for unstructured objects (e.g. video files), then cloud storage is most likely a better option. Bigtable is best suited to the following scenarios: time-series data (e.g. CPU and memory usage over time for multiple servers), financial data (e.g. transaction histories, stock prices, and currency exchange rates), and IoT (Internet of Things) use cases.
With BigQuery, it is possible to run complex analytical SQL-based queries under large sets of data. However, users can also leverage NoSQL techniques. For example, it is encouraged to denormalize data when designing schemas and loading data to BigQuery for performance purposes.
BigQuery is a smart choice for queries that require a “table scan,” or when you need to look across the entire database (e.g. calculating sums, averages, counts, or groupings). However, there are many limitations to BigQuery, such as a limited number of updates in the table per day, as well as restrictions on data size per request. This is because BigQuery is not intended to be a substitute for traditional relational databases; it focuses on running analytical queries as opposed to basic CRUD operations.
Bigtable vs. BigQuery: Mutable or Immutable?
Data is immutable within BigQuery: an uploaded object cannot change throughout its storage lifetime once written, and data cannot be deleted or altered for a pre-determined length of time. If an existing record needs to be modified, the partition must be rewritten. BigQuery is append-only, and will automatically drop partitions older than the preconfigured time to live to limit the volume of stored data.
Bigtable stores data in scalable tables, each of which is a sorted key/value map that is indexed by a column key, row key, and a timestamp. This enables mutable data and fast key-based lookup. Each row typically describes a single entity, and columns contain individual values for each row. Read and write operations of data to rows are atomic, regardless of how many columns are read or written within that row.
How Integrate.io Can Help with BigQuery
To sum up, here are the major differences between Bigtable and BigQuery:
- Bigtable is a NoSQL database service with mutable data that is best suited for OLTP use cases.
- BigQuery is a SQL data warehouse with immutable data that is best suited for OLAP use cases, such as business intelligence and analytics.
BigQuery is an excellent choice for businesses that need a cloud data warehousing solution. But there’s still one question left: how will you extract your enterprise data, scattered across a wide variety of sources, and get it into a centralized location like BigQuery?
Integrate.io is here to help. The robust, feature-rich Integrate.io platform is an automated ETL and data integration tool for building pipelines between your data sources and your cloud data warehouse. With more than 100 pre-built integrations—including BigQuery—and a user-friendly drag-and-drop interface, it’s never been easier to set up and manage your data integration workflows.
Ready to see what Integrate.io can do for you? Get in touch with our team of data experts today for a chat about your situation, or to start your 14-day pilot of the Integrate.io platform.