Frequently Asked Questions¶
This section discusses frequently asked questions about MemSQL.
- What is MemSQL?
- Is MemSQL a storage engine for MySQL?
- Is MemSQL a Just In Time compiler for SQL?
- Does MemSQL support fully ACID transactions?
- What kind of workloads work best on MemSQL?
- Is MemSQL a row-based or column-based store?
- How does MemSQL’s lock-free storage engine work?
- What is the advantage of MemSQL over MySQL with memory tables or MySQL on a ramdisk?
- Query Compilation
- Deployment and Management
What is MemSQL?¶
MemSQL is a next generation database that removes the most common bottleneck applications hit today: disk. By offering a familiar relational interface to an in-memory data tier, MemSQL empowers developers with the technology web-scale companies use to cope with massive traffic and growth. MemSQL offers orders of magnitude improvements in write and read performance and greatly simplifies application development and maintenance by offering wire-compatibility with MySQL.
The key features that make MemSQL fast are memory-optimized lock-free indexes and SQL-to-C++ code generation. MemSQL is also durable to disk by default.
Is MemSQL a storage engine for MySQL?¶
No. MemSQL is a new database that is compatible with the MySQL client. MemSQL includes its own storage engine and execution engine built around lock-free data structures and SQL-to-C++ code generation.
Is MemSQL a Just In Time compiler for SQL?¶
Yes. When MemSQL encounters a new kind of query, it compiles a plan and loads it into the database server as a shared object. Query compilation gives MemSQL the same boost over traditional relational database management systems that C++ has over interpreted languages.
Does MemSQL support fully ACID transactions?¶
Yes. Currently, MemSQL supports the READ COMMITTED isolation level and each query runs in a separate transaction.
What kind of workloads work best on MemSQL?¶
MemSQL works best on workloads with highly concurrent read and write queries. Each query is individually executed on exactly one core. Read queries are never blocked by other reads or writes because of multi-version concurrency control.
Is MemSQL a row-based or column-based store?¶
MemSQL is a row-based store and therefore offers very fast seeks and writes to individual rows.
How does MemSQL’s lock-free storage engine work?¶
MemSQL’s storage engine uses multi version concurrency control with lock-free hash tables and lock-free skip lists which allow highly concurrent reads and writes at very high throughput. Reads in MemSQL are never blocked, but updates to the same row can conflict.
What is the advantage of MemSQL over MySQL with memory tables or MySQL on a ramdisk?¶
Two common techniques for running MySQL in memory are the MEMORY storage engine and running a disk-based storage engine (like InnoDB) with a large in-memory buffer pool. MySQL’s MEMORY storage engine operates with table-level locks and is not optimized for high-concurrency workloads. In fact, MySQL performs much better with InnoDB configured to use a large buffer pool.
While running an existing storage engine like InnoDB in memory can alleviate some of the bottleneck involved with disk, MemSQL has two distinguishing, memory-optimized features that enable it to perform significantly better than InnoDB in memory:
- Lock-free data structures. MemSQL uses memory-optimized, lock-free skip lists and hash tables as its indexes. Unlike B-Trees, these data structures are designed from the ground up to be fast in memory.
- Code generation. Lock-free data structures are so fast that dynamic SQL interpretation quickly becomes the limiting factor for query execution. With code generation, MemSQL compiles SQL down to native code for maximum performance.
How does query compilation work in MemSQL? Why is it important?¶
Traditional relational database management systems interpret SQL queries the same way interpreted languages like Python and Ruby run programs. MemSQL compiles queries into machine code like a C or C++ compiler. This is achieved by transforming SQL into C++ (Code Generation) and then compiling the generated code. Code-generation is a one-time only operation. Once the query plan has been compiled, future queries that match the plan bypass code generation and the compiler entirely.
Before databases ran in memory, the overhead associated with managing disk I/O dominated the CPU cost of query execution for write heavy workloads. With data residing in memory, however, query execution becomes the limiting factor of the system. MemSQL uses query compilation to optimize the execution code path ahead of time. Without the overhead of dynamically interpreting SQL, MemSQL can execute queries as fast or faster than optimized NoSQL solutions.
Does this mean that if I change a constant in a query MemSQL will compile a new plan?¶
In most cases, no. MemSQL strips out the numeric and string parameters from a query and correlates the resulting string to a compiled plan. This string is referred to as a parametrized query. For example, SELECT * FROM foo WHERE id=22 AND name='bar' is converted to SELECT * FROM foo WHERE id=@ AND name=^.
The one exception to this rule is constants in the projection clause of a query. These constants are compiled directly into the plan’s assembly code for performance reasons. For example, SELECT id + 1 FROM foo WHERE name='bar' is converted to SELECT id + 1 FROM foo WHERE name=^.
Does being in-memory mean that MemSQL will lose all data upon system failure or restart?¶
No. Unlike traditional relational database management systems, MemSQL uses RAM as the primary storage for data. However, MemSQL backs up data to disk with snapshots and transaction logs. These features can be tuned all the way from synchronous durability (every write transaction is recorded on disk before the query completes) to purely in-memory durability (maximum sustained throughput on writes).
On restart, MemSQL uses the snapshot and log files to recover its state to what it was before shutting down. Because the recovery process is parallelized across CPUs, the bottleneck in this process is the sequential hard drive speed.
If MemSQL writes data to disk, how can it be faster than disk-based databases?¶
Traditional relational database management systems use disk as the primary storage for data and memory as a cache. Managing this caching layer (the buffer pool) adds I/O and CPU overhead. Furthermore, because data must be efficiently synced back to disk, these systems must store data in B-Trees.
On the other hand, MemSQL stores data primarily in memory and backs it up to disk in a compact format. As a result, MemSQL uses only sequential I/O and the transaction log size is significantly smaller. Furthermore, MemSQL can use memory-optimized, lock-free skip lists and hash tables that cannot be managed in a buffer pool.
Deployment and Management¶
How can I get a copy of MemSQL?¶
Visit the download page to obtain a license key and download link for MemSQL.
Does MemSQL run in the cloud?¶
How can I backup MemSQL?¶
MemSQL automatically backs itself up with snapshot and log files. When you restart the server, it will read from these files to restore the state of the database. If you want to make incremental backups of MemSQL, use mysqldump. MemSQL supports LOCK TABLES, so you can use mysqldump to take a consistent snapshot of the database.
How much disk space should I allocate for MemSQL?¶
MemSQL uses disk for two types of storage:
- Snapshot and log files that backup data stored in MemSQL. You should allocate about as much space on disk as memory on your machine (10 GB for the developer build).
- Shared object files that are the result of code generation. Data Definition Language (DDL) queries like CREATE TABLE and ALTER TABLE cost 50 MB each on average. Data Manipulation Language (DML) queries like INSERT, UPDATE, DELETE, and SELECT require about 5 MB per unique plan (see Query Compilation).
Therefore, you should allocate roughly the amount of memory on your machine + 50 MB for each DDL plan + 5 MB for each DML plan. Note that the exact disk requirements will vary with the application, so it is advisable (and usually cheap) to allocate some extra disk space.