Peter Zaitsev, Founder and CEO of Percona and coauthor of HighPerformanceMySQL, 3rd edition
High Performance MySQL
FOURTH EDITION
Proven Strategies for Operating at Scale
With Early Release ebooks, you get books in their earliest form— the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
Editors: Virginia Wilson and Andy Kwan
Production Editor: Elizabeth Faerm
Copyeditor: Shannon Turlington
Proofreader: Kim Cofer
Indexer: Judith McConville
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Kate Dullea
December 2021: Fourth Edition
Revision History for the Early Release
2021-01-05: First Release
2021-07-29: Second Release
2021-11-12: Third Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492080510for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. HighPerformanceMySQL, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the author(s) and do not represent the publisher’s views. While the publisher and the author(s) have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author(s) disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-492-08044-2
[LSI]
Preface
A NOTE FOR EARLY RELEASE READERS
With Early Release ebooks, you get books in their earliest form— the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles.
This will be the Preface of the final book. Please note that the GitHub repo will be made active later on.
If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at vwilson@oreilly.com.
The official documentation maintained by Oracle gives you the knowledge necessary to install, configure, and interact with MySQL. This book serves as a companion to that documentation, helping you understand how best to leverage MySQL as a powerful data platform for your use case.
This edition also expands on the growing role of compliance and security as parts of operating a database footprint. New realities such as privacy laws and data sovereignty have changed how companies build their products, and that naturally introduces new complexities in how the technical architecture evolves.
Who This Book Is For
This book is first and foremost for engineers looking to grow their expertise in running MySQL. This edition assumes its audience is familiar with the basic principles of why you want to use a relational database management system (RDBMS). We also assume some experience with general system administration, networking, and operating systems.
We will offer you proven strategies for running MySQL at scale with a modern architecture and more up-to-date tooling and practices.
Ultimately, we hope that the knowledge you gain from this book of MySQL’s internals and scaling strategies will help you in scaling the data storage layer at your organization. And we hope that your newfound insight will help you to learn and practice a methodical approach to designing, maintaining, and troubleshooting an architecture that is built on MySQL.
What Is Different in This Edition
HighPerformanceMySQLhas been a part of the database engineering community for years, with past editions released in 2004, 2008, and 2012. In these previous editions, the goal was always to teach developers and administrators how to optimize MySQL for every drop of performance by focusing on deep internal design, explaining what various tuning settings mean, and arming the user with the knowledge to be effective in changing these settings. This edition maintains the same goal but with a different focus.
Since the third edition, the MySQL ecosystem has seen a lot of changes. Three new major versions have gone general availability (GA). The tooling landscape expanded significantly beyond Perl and
Bash scripts and into full-fledged tooling solutions. Entirely new open source projects have been built that change how organizations manage scaling MySQL.
Even the traditional database administrator (DBA) role has evolved. There’s an old joke in the industry that says that DBA stands for “Don’t Bother Asking.” DBAs had a reputation for being speed bumps in the software development life cycle (SDLC), not explicitly because of any curmudgeonly attitude, but simply because databases weren’t evolving as fast as the rest of the SDLC around them.
With books like DatabaseReliabilityEngineering:Designingand OperatingResilientDatabaseSystemsby Laine Campbell and Charity Majors (O’Reilly), it has become the new reality that technical organizations look to database engineers more as enablers of business growth and less as the sole operators of all databases. Where once a DBA’s primary day-to-day involved schema design and query optimization, they now are responsible for teaching those skills to developers and managing systems that allow developers to deploy their own schema changes quickly and safely.
With these changes, the focus should no longer be on optimizing MySQL to get a few percentage points faster. We think that High PerformanceMySQLis now about giving people the information they need to make educated decisions about how to best use MySQL. This begins by understanding how MySQL is designed, which gives way to understanding what MySQL is and is not good at. Modern releases of MySQL offer reasonably sane defaults, and there’s very little tuning you need to do unless you’re experiencing a very specific scaling problem. Modern teams are now dealing with schema changes, compliance issues, and sharding. We want High PerformanceMySQLto be a comprehensive guide to how modern companies run MySQL at scale.
1
Conventions Used in This Book
The following typographical conventions are used in this book: Italic
Used for new terms, URLs, email addresses, usernames, hostnames, filenames, file extensions, pathnames, directories, and Unix commands and utilities.
Constant width
Indicates elements of code, configuration options, database and table names, variables and their values, functions, modules, the contents of files, or the output from commands.
Constant width bold
Shows commands or other text that should be typed literally by the user. Also used for emphasis in command output.
Constant width italic
Shows text that should be replaced with user-supplied values.
This icon signifies a tip or suggestion.
NOTE
This icon signifies a tip, suggestion, or general note.
This icon indicates a warning or caution.
Using Code Examples
This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You don’t need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book doesn’t require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code doesn’t require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.
We appreciate, but don’t require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “High PerformanceMySQL, Fourth Edition, by Silvia Botros and Jeremy Tinley et al. (O’Reilly). Copyright 2022 Silvia Botros and Jeremy Tinley, 978-1-492-08051-0.”
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com.
O’Reilly Online Learning
NOTE
For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed.
Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at
https://oreil.ly/RLZis.
Email bookquestions@oreilly.com to comment or ask technical questions about this book.
For news and information about our books and courses, visit http://oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Acknowledgments for the Fourth Edition
From Silvia
First and foremost, I’d like to thank my family. My parents, who sacrificed stable jobs and lives in Egypt to bring me and my brother to the United States. My husband, Armea, for supporting me through this and all the past years of my career as I took on one challenge after the next, culminating in this accomplishment.
I started off in tech as an immigrant who left her college years in the Middle East to achieve her dream of moving to the United States. After earning my degree in a state university in California, I took a job in New York City, and I remember the second edition of this book being the very first tech book I bought with my own money that was not a college book. I owe the authors of the previous editions a lot of the fundamental lessons that prepared me to manage databases during my career.
The support and encouragement of so many people I have worked with in my career have gotten me to write this edition of the book that taught me so much in the past. I’d like to thank Tim Jenkins, the former CTO of SendGrid, for hiring me for the job of a lifetime even though I told him in my interview that he was using MySQL
replication the wrong way. For trusting me with what turned out to be a rocket ship.
I’d like to thank all the amazing women in tech who have been my support network and cheerleaders. Special thanks to Camille Fournier and Dr. Nicole Forsgren for writing the two books that have influenced the past few years of my career and changed my view on my day-to-day work.
Thank you to my team at Twilio. To Sean Kilgore for making me a much better engineer who cares about a lot more than just the databases. To John Martin for being the most optimistic human I ever worked with. Thanks to Laine Campbell and her PalominoDB team (later acquired by Pythian) who helped support me and taught me so much during the toughest years, and to Baron Schwartz for encouraging me to write about my experiences.
Finally, thanks to Virginia Wilson for being an excellent editor, for helping turn my stream of ideas into sentences that make sense and for helping me through this process with so much support and grace.
From Jeremy
When Silvia approached me to help with this book, it was in the middle of an extraordinarily stressful period of most people’s lives— the global pandemic, which started in 2020. I was unsure that I wanted to add any more stress to my life. My wife, Selena, told me that I would regret it if I didn’t accept, and I know better than to argue with her. She has always supported me and encouraged me to be the best human being I can be. I will forever love her for all that she does for me.
To my family, coworkers, and community friends: I would have never gotten to this point without you. You all taught me how to be who I am today. My career is the sum of my experiences with you all. You taught me how to accept criticism, how to lead by example, how to
fail and recover, and most importantly, that the sum is better than the individual.
Lastly, I want to thank Silvia, who trusted me to bring a shared understanding but different perspective to this book. I hope I met your expectations.
A Thank You to Tech Reviewers
The authors also want to recognize the tech reviewers who helped get this book to where it is today: Aisha Imran, Andrew Regner, Baron Schwartz, Daniel Nichter, Hayley Anderson, Ivan Mora Perez, Jam Leoni, Jaryd Remillard, Jennifer Davis, Jeremy Cole, Keith Wells, Kris Hamoud, Nick Vyzas, Shubheksha Jalan, Tom Krouper, and Will Gunty. Thank you all for your time and effort.
1 Famously, people often used MySQL as a queue and then learned the hard way why it was bad. The most cited reasons were the overhead of polling for new queue actions, the management of locking records for processing, and the unwieldy size of queue tables as data grows over time.
Chapter 1. MySQL Architecture
A NOTE FOR EARLY RELEASE READERS
With Early Release ebooks, you get books in their earliest form— the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles.
This will be Chapter 1 of the final book. Please note that the GitHub repo will be made active later on.
If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at vwilson@oreilly.com.
MySQL’s architectural characteristics make it useful for a wide range of purposes. Although it is not perfect, it is flexible enough to work well in both small and large environments. These range from a personal website up to large-scale enterprise applications. To get the most from MySQL, you need to understand its design so that you can work with it, not against it.
This chapter provides a high-level overview of the MySQL server architecture, the major differences between the storage engines, and why those differences are important. We’ve tried to explain MySQL by simplifying the details and showing examples. This discussion will be useful for those new to database servers as well as readers who are experts with other database servers.
MySQL’s Logical Architecture
A good mental picture of how MySQL’s components work together will help you understand the server. Figure 1-1 shows a logical view of MySQL’s architecture.
The topmost layer, clients, contains the services that aren’t unique to MySQL. They’re services most network-based client/server tools or servers need: connection handling, authentication, security, and so forth.
The second layer is where things get interesting. Much of MySQL’s brains are here, including the code for query parsing, analysis, optimization, and all the built-in functions (e.g., dates, times, math, and encryption). Any functionality provided across storage engines lives at this level: stored procedures, triggers, and views, for example.
The third layer contains the storage engines. They are responsible for storing and retrieving all data stored “in” MySQL. Like the various filesystems available for GNU/Linux, each storage engine has its own benefits and drawbacks. The server communicates with them through the storage engine API. This API hides differences between storage engines and makes them largely transparent at the query layer. It also contains a couple of dozen low-level functions that perform operations such as “begin a transaction” or “fetch the row that has this primary key.” The storage engines don’t parse SQL or communicate with one another; they simply respond to requests from the server.
1
Connection Management and Security
By default, each client connection gets its own thread within the server process. The connection’s queries execute within that single
thread, which in turn resides on one core or CPU. The server maintains a cache of ready-to-use threads, so they don’t need to be created and destroyed for each new connection.
2
When clients (applications) connect to the MySQL server, the server needs to authenticate them. Authentication is based on username, originating host, and password. X.509 certificates can also be used across a Transport Layer Security (TLS) connection. Once a client has connected, the server verifies whether the client has privileges for each query it issues (e.g., whether the client is allowed to issue a SELECT statement that accesses the Country table in the world database).
Optimization and Execution
MySQL parses queries to create an internal structure (the parse tree) and then applies a variety of optimizations. These can include rewriting the query, determining the order in which it will read tables, choosing which indexes to use, and so on. You can pass hints to the optimizer through special keywords in the query, affecting its decision-making process. You can also ask the server to explain various aspects of optimization. This lets you know what decisions the server is making and gives you a reference point for reworking queries, schemas, and settings to make everything run as efficiently as possible. There is more detail on this in Chapter 8.
The optimizer does not really care what storage engine a particular table uses, but the storage engine does affect how the server optimizes the query. The optimizer asks the storage engine about some of its capabilities and the cost of certain operations as well as for statistics on the table data. For instance, some storage engines support index types that can be helpful to certain queries. You can read more about schema optimization and indexing in Chapters 6 and 7.
In older versions, MySQL made use of an internal query cache to see if it could serve the results from there. However, as concurrency increased, the query cache became a notorious bottleneck. As of MySQL 5.7.20, the query cache was officially deprecated as a MySQL feature, and in the 8.0 release, the query cache is fully removed. Even though the query cache is no longer a core part of the MySQL server, caching frequently served result sets is a good practice. While outside the scope of this book, a popular design pattern is to cache data in memcached or Redis.
Concurrency Control
Any time more than one query needs to change data at the same time, the problem of concurrency control arises. For our purposes in this chapter, MySQL has to do this at two levels: the server level and the storage-engine level. We will give you a simplified overview of how MySQL deals with concurrent readers and writers, so you have the context you need for the rest of this chapter.
To illustrate how MySQL handles concurrent work on the same set of data, we will use a traditional spreadsheet file as an example. A spreadsheet consists of rows and columns, much like a database table. Assume the file is on your laptop and only you have access to it. There are no potential conflicts; only you can make changes to the file. Now, imagine you need to collaborate with a coworker on that spreadsheet. It is now on a shared server that both of you have access to. What happens when both of you need to make changes to this file at the same time? What if we have an entire team of people actively trying to edit, add, and remove cells from this spreadsheet? We can say that they should take turns making changes, but that is not efficient. We need an approach for allowing concurrent access to a high-volume spreadsheet.
Read/Write Locks
Reading from the spreadsheet isn’t as troublesome. There’s nothing wrong with multiple clients reading the same file simultaneously; because they aren’t making changes, nothing is likely to go wrong. What happens if someone tries to delete cell number A25 while others are reading the spreadsheet? It depends, but a reader could come away with a corrupted or inconsistent view of the data. So, to be safe, even reading from a spreadsheet requires special care.
If you think of the spreadsheet as a database table, it’s easy to see that the problem is the same in this context. In many ways, a spreadsheet is really just a simple database table. Modifying rows in a database table is very similar to removing or changing the content of cells in a spreadsheet file.
The solution to this classic problem of concurrency control is rather simple. Systems that deal with concurrent read/write access typically implement a locking system that consists of two lock types. These locks are usually known as sharedlocksand exclusivelocks, or read locks and write locks.
Without worrying about the actual locking mechanism, we can describe the concept as follows. Readlockson a resource are shared, or mutually nonblocking: many clients can read from a resource at the same time and not interfere with one another. Write locks, on the other hand, are exclusive—that is, they block both read locks and other write locks—because the only safe policy is to have a single client writing to the resource at a given time and to prevent all reads when a client is writing.
In the database world, locking happens all the time: MySQL has to prevent one client from reading a piece of data while another is changing it. If a database server is performing in an acceptable manner, this management of locks is fast enough to not be noticeable to the clients. We will discuss in Chapter 8 how to tune your queries to avoid performance issues caused by locking.
Lock Granularity
One way to improve the concurrency of a shared resource is to be more selective about what you lock. Rather than locking the entire resource, lock only the part that contains the data you need to change. Better yet, lock only the exact piece of data you plan to change. Minimizing the amount of data that you lock at any one time lets changes to a given resource occur simultaneously, as long as they don’t conflict with each other.
Unfortunately, locks are not free—they consume resources. Every lock operation—getting a lock, checking to see whether a lock is free, releasing a lock, and so on—has overhead. If the system spends too much time managing locks instead of storing and retrieving data, performance can suffer.
A locking strategy is a compromise between lock overhead and data safety, and that compromise affects performance. Most commercial database servers don’t give you much choice: you get what is known as row-level locking in your tables, with a variety of often complex ways to give good performance with many locks. Locks are how databases implement consistency guarantees. An expert operator of a database would have to go as far as reading the source code to determine the most appropriate set of tuning configurations to optimize this trade-off of speed versus data safety.
MySQL, on the other hand, does offer choices. Its storage engines can implement their own locking policies and lock granularities. Lock management is a very important decision in storage-engine design; fixing the granularity at a certain level can improve performance for certain uses yet make that engine less suited for other purposes. Because MySQL offers multiple storage engines, it doesn’t require a single general-purpose solution. Let’s have a look at the two most important lock strategies.
Table locks
The most basic locking strategy available in MySQL, and the one with the lowest overhead, is table locks. A tablelockis analogous to the spreadsheet locks described earlier: it locks the entire table. When a client wishes to write to a table (insert, delete, update, etc.), it acquires a write lock. This keeps all other read and write operations at bay. When nobody is writing, readers can obtain read locks, which don’t conflict with other read locks.
Table locks have variations for improved performance in specific situations. For example, READ LOCAL table locks allow some types of concurrent write operations. Write and read lock queues are separate with the write queue being wholly of higher priority than the read queue.
Row locks
The locking style that offers the greatest concurrency (and carries the greatest overhead) is the use of row locks. Going back to the spreadsheet analogy, rowlockswould be the same as locking just the row in the spreadsheet. This strategy allows multiple people to edit different rows concurrently without blocking one another. This enables the server to take more concurrent writes, but the cost is more overhead in having to keep track of who has each row lock, how long they have been open, and what kind of row locks they are as well as cleaning up locks when they are no longer needed.
Row locks are implemented in the storage engine, not the server. The server is mostly unaware of locks implemented in the storage engines, and as you’ll see later in this chapter and throughout the book, the storage engines all implement locking in their own ways. 3 4
Transactions
You can’t examine the more advanced features of a database system for very long before transactions enter the mix. A transactionis a group of SQL statements that are treated atomically, as a single unit of work. If the database engine can apply the entire group of statements to a database, it does so, but if any of them can’t be done because of a crash or other reason, none of them is applied. It’s all or nothing.
Little of this section is specific to MySQL. If you’re already familiar with ACID transactions, feel free to skip ahead to “Transactions in MySQL”.
A banking application is the classic example of why transactions are necessary. Imagine a bank’s database with two tables: checking and savings. To move $200 from Jane’s checking account to her savings account, you need to perform at least three steps:
1. Make sure her checking account balance is greater than $200.
2. Subtract $200 from her checking account balance.
3. Add $200 to her savings account balance.
The entire operation should be wrapped in a transaction so that if any one of the steps fails, any completed steps can be rolled back.
You start a transaction with the START TRANSACTION statement and then either make its changes permanent with COMMIT or discard the changes with ROLLBACK. So the SQL for our sample transaction might look like this:
1 START TRANSACTION;
2 SELECT balance FROM checking WHERE customer id = 10233276;
3 UPDATE checking SET balance = balance - 200.00 WHERE 5
customer id = 10233276;
4 UPDATE savings SET balance = balance + 200.00 WHERE customer id = 10233276;
5 COMMIT;
Transactions alone aren’t the whole story. What happens if the database server crashes while performing line 4? Who knows? The customer probably just lost $200. What if another process comes along between lines 3 and 4 and removes the entire checking account balance? The bank has given the customer a $200 credit without even knowing it.
And there are a lot more failure possibilities in this sequence of operations. You could see connection drops, timeouts, or even a crash of the database server running them midway through the operations. This is typically why highly complex and slow two-phasecommit systems exist: to mitigate against all sorts of failure scenarios.
Transactions aren’t enough unless the system passes the ACID test. ACID stands for atomicity, consistency, isolation, and durability. These are tightly related criteria that a data-safe transaction processing system must meet:
Atomicity
A transaction must function as a single indivisible unit of work so that the entire transaction is either applied or never committed. When transactions are atomic, there is no such thing as a partially completed transaction: it’s all or nothing.
Consistency
The database should always move from one consistent state to the next. In our example, consistency ensures that a crash between lines 3 and 4 doesn’t result in $200 disappearing from
the checking account. If the transaction is never committed, none of the transaction’s changes are ever reflected in the database.
Isolation
The results of a transaction are usually invisible to other transactions until the transaction is complete. This ensures that if a bank account summary runs after line 3 but before line 4 in our example, it will still see the $200 in the checking account. When we discuss isolation levels later in this chapter, you’ll understand why we said “usually invisible.”
Durability
Once committed, a transaction’s changes are permanent. This means the changes must be recorded such that data won’t be lost in a system crash. Durability is a slightly fuzzy concept, however, because there are actually many levels. Some durability strategies provide a stronger safety guarantee than others, and nothing is ever 100% durable (if the database itself were truly durable, then how could backups increase durability?).
ACID transactions and the guarantees provided through them in the InnoDB engine specifically are one of the strongest and most mature features in MySQL. While they come with certain throughput tradeoffs, when applied appropriately they can save you from implementing a lot of complex logic in the application layer.
Isolation Levels
Isolation is more complex than it looks. The ANSI SQL standard defines four isolation levels. If you are new to the world of databases, we highly recommend you get familiar with the general standard of ANSI SQL before coming back to reading about the specific MySQL implementation. The goal of this standard is to define the rules for which changes are and aren’t visible inside and outside a transaction. Lower isolation levels typically allow higher concurrency and have lower overhead.
NOTE
Each storage engine implements isolation levels slightly differently, and they don’t necessarily match what you might expect if you’re used to another database product (thus, we won’t go into exhaustive detail in this section). You should read the manuals for whichever storage engines you decide to use.
Let’s take a quick look at the four isolation levels: READ UNCOMMITTED
In the READ UNCOMMITTED isolation level, transactions can view the results of uncommitted transactions. At this level, many problems can occur unless you really, really know what you are doing and have a good reason for doing it. This level is rarely used in practice because its performance isn’t much better than the other levels, which have many advantages. Reading uncommitted data is also known as a dirtyread.
READ COMMITTED
The default isolation level for most database systems (but not MySQL!) is READ COMMITTED. It satisfies the simple definition of isolation used earlier: a transaction will continue to see changes made by transactions that were committed after it began, and its changes won’t be visible to others until it has committed. This level still allows what’s known as a nonrepeatable read. This means you can run the same statement twice and see different data.
REPEATABLE READ
REPEATABLE READ solves the problems that READ UNCOMMITTED allows. It guarantees that any rows a transaction reads will “look the same” in subsequent reads within the same transaction, but in theory it still allows another tricky problem: phantom reads. Simply put, a phantom read can happen when you select some range of rows, another transaction inserts a new row into the range, and then you select the same range again; you will then see the new “phantom” row. InnoDB and XtraDB solve the phantom read problem with multiversion concurrency control, which we explain later in this chapter.
REPEATABLE READ is MySQL’s default transaction isolation level.
SERIALIZABLE
The highest level of isolation, SERIALIZABLE, solves the phantom read problem by forcing transactions to be ordered so that they can’t possibly conflict. In a nutshell, SERIALIZABLE places a lock on every row it reads. At this level, a lot of timeouts and lock contention can occur. We’ve rarely seen people use this
isolation level, but your application’s needs might force you to accept the decreased concurrency in favor of the data safety that results.
Table 1-1 summarizes the various isolation levels and the drawbacks associated with each one.
Table1-1.ANSISQLisolationlevels
Isolation level Dirty reads possible Nonrepeatable reads possible Phantom reads possible Locking re
READ UNCOMMIT TED Yes Yes Yes No READ COMMITTE D No
REPEATABLE RE
Deadlocks
A deadlockis when two or more transactions are mutually holding and requesting locks on the same resources, creating a cycle of dependencies. Deadlocks occur when transactions try to lock resources in a different order. They can happen whenever multiple transactions lock the same resources. For example, consider these two transactions running against a StockPrice table, which has a primary key of (stock_id, date):
Transaction 1
START TRANSACTION;
UPDATE StockPrice SET close = 45.50 WHERE stock id = 4 and date = ‘2020-05-01’;
UPDATE StockPrice SET close = 19.80 WHERE stock id = 3 and date = ‘2020-05-02’; COMMIT;
Transaction 2
START TRANSACTION;
UPDATE StockPrice SET high = 20.12 WHERE stock id = 3 and date = ‘2020-05-02’;
UPDATE StockPrice SET high = 47.20 WHERE stock id = 4 and date = ‘2020-05-01’; COMMIT;
Each transaction will execute its first query and update a row of data, locking that row in the primary key index and any additional unique index it is part of in the process. Each transaction will then attempt to update its second row, only to find that it is already locked. The two transactions will wait forever for each other to complete unless something intervenes to break the deadlock. We cover further in Chapter 7 how indexing can make or break the performance of your queries as your schema evolves.
To combat this problem, database systems implement various forms of deadlock detection and timeouts. The more sophisticated systems, such as the InnoDB storage engine, will notice circular dependencies and return an error instantly. This can be a good thing —otherwise, deadlocks would manifest themselves as very slow queries. Others will give up after the query exceeds a lock wait timeout, which is not always good. The way InnoDB currently handles deadlocks is to roll back the transaction that has the fewest exclusive row locks (an approximate metric for which will be the easiest to roll back).
Lock behavior and order are storage engine specific, so some storage engines might deadlock on a certain sequence of statements
even though others won’t. Deadlocks have a dual nature: some are unavoidable because of true data conflicts, and some are caused by how a storage engine works.
Once they occur, deadlocks cannot be broken without rolling back one of the transactions, either partially or wholly. They are a fact of life in transactional systems, and your applications should be designed to handle them. Many applications can simply retry their transactions from the beginning, and unless they encounter another deadlock, they should be successful.
Transaction Logging
Transaction logging helps make transactions more efficient. Instead of updating the tables on disk each time a change occurs, the storage engine can change its in-memory copy of the data. This is very fast. The storage engine can then write a record of the change to the transaction log, which is on disk and therefore durable. This is also a relatively fast operation, because appending log events involves sequential I/O in one small area of the disk instead of random I/O in many places. Then, at some later time, a process can update the table on disk. Thus, most storage engines that use this technique (known as write-aheadlogging) end up writing the changes to disk twice.
If there’s a crash after the update is written to the transaction log but before the changes are made to the data itself, the storage engine can still recover the changes upon restart. The recovery method varies between storage engines.
Transactions in MySQL
Storage engines are the software that drives how data will be stored and retrieved from disk. While MySQL has traditionally offered a number of storage engines that support transactions, InnoDB is now the gold standard and the recommended engine to use. Transaction
primitives described here will be based on transactions in the InnoDB engine.
Understanding AUTOCOMMIT
By default, a single INSERT, UPDATE, or DELETE statement is implicitly wrapped in a transaction and committed immediately. This is known as AUTOCOMMIT mode. By disabling this mode, you can execute a series of statements within a transaction and, at conclusion, COMMIT or ROLLBACK.
You can enable or disable the AUTOCOMMIT variable for the current connection by using a SET command. The values 1 and ON are equivalent, as are 0 and OFF. When you run with AUTOCOMMIT=0, you are always in a transaction until you issue a COMMIT or ROLLBACK. MySQL then starts a new transaction immediately. Additionally, with AUTOCOMMIT enabled, you can begin a multistatement transaction by using the keyword BEGIN or START TRANSACTION. Changing the value of AUTOCOMMIT has no effect on nontransactional tables, which have no notion of committing or rolling back changes.
Certain commands, when issued during an open transaction, cause MySQL to commit the transaction before they execute. These are typically DDL commands that make significant changes, such as ALTER TABLE, but LOCK TABLES and some other statements also have this effect. Check your version’s documentation for the full list of commands that automatically commit a transaction.
MySQL lets you set the isolation level using the SET TRANSACTION ISOLATION LEVEL command, which takes effect when the next transaction starts. You can set the isolation level for the whole server in the configuration file or just for your session: