MySQL
Airbyte's certified MySQL connector offers the following features:
- Multiple methods of keeping your data fresh, including Change Data Capture (CDC) using the binlog.
- All available sync modes, providing flexibility in how data is delivered to your destination.
- Reliable replication at any table size with checkpointing and chunking of database reads.
The contents below include a 'Quick Start' guide, advanced setup steps, and reference information (data type mapping and changelogs).
Quick Start
Here is an outline of the minimum required steps to configure a MySQL connector:
- Create a dedicated read-only MySQL user with permissions for replicating data
- Create a new MySQL source in the Airbyte UI using CDC logical replication
- (Airbyte Cloud Only) Allow inbound traffic from Airbyte IPs
Once this is complete, you will be able to select MySQL as a source for replicating data.
Step 1: Create a dedicated read-only MySQL user
These steps create a dedicated read-only user for replicating data. Alternatively, you can use an existing MySQL user in your database.
The following commands will create a new user:
CREATE USER <user_name> IDENTIFIED BY 'your_password_here';
Now, provide this user with read-only access to relevant schemas and tables:
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO <user_name>;
If choosing to run using the STANDARD
replication method (not recommended), only the SELECT
permission is required.
Step 2: Enable binary logging on your MySQL server
You must enable binary logging for MySQL replication using CDC. Most cloud providers (AWS, GCP, etc.) provide easy one-click options for enabling the binlog on your source MySQL database.
If you are self-managing your MySQL server, configure your MySQL server configuration file with the following properties:
Configuring MySQL server config files to enable binlog
server-id = 223344
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
binlog_expire_logs_seconds = 864000
- server-id : The value for the server-id must be unique for each server and replication client in the MySQL cluster. The
server-id
should be a non-zero value. If theserver-id
is already set to a non-zero value, you don't need to make any change. You can set theserver-id
to any value between 1 and 4294967295. For more information refer mysql doc - log_bin : The value of log_bin is the base name of the sequence of binlog files. If the
log_bin
is already set, you don't need to make any change. For more information refer mysql doc - binlog_format : The
binlog_format
must be set toROW
. For more information refer mysql doc - binlog_row_image : The
binlog_row_image
must be set toFULL
. It determines how row images are written to the binary log. For more information refer mysql doc - binlog_expire_logs_seconds : This is the number of seconds for automatic binlog file removal. We recommend 864000 seconds (10 days) so that in case of a failure in sync or if the sync is paused, we still have some bandwidth to start from the last point in incremental sync. We also recommend setting frequent syncs for CDC.
Step 3: Create a new MySQL source in Airbyte UI
From your Airbyte Cloud or Airbyte Open Source account, select Sources
from the left navigation bar, search for MySQL
, then create a new MySQL source.
To fill out the required information:
- Enter the hostname, port number, and name for your MySQL database.
- Enter the username and password you created in Step 1.
- Select an SSL mode. You will most frequently choose
require
orverify-ca
. Both of these always require encryption.verify-ca
also requires certificates from your MySQL database. See here to learn about other SSL modes and SSH tunneling. - Select
Read Changes using Binary Log (CDC)
from available replication methods.
Step 4: (Airbyte Cloud Only) Allow inbound traffic from Airbyte IPs.
If you are on Airbyte Cloud, you will always need to modify your database configuration to allow inbound traffic from Airbyte IPs. You can find a list of all IPs that need to be allowlisted in our Airbyte Security docs.
Now, click Set up source
in the Airbyte UI. Airbyte will now test connecting to your database. Once this succeeds, you've configured an Airbyte MySQL source!
MySQL Replication Modes
Change Data Capture (CDC)
Airbyte uses logical replication of the MySQL binlog to incrementally capture deletes. To learn more how Airbyte implements CDC, refer to Change Data Capture (CDC). We generally recommend configure your MySQL source with CDC whenever possible, as it provides:
- A record of deletions, if needed.
- Scalable replication to large tables (1 TB and more).
- A reliable cursor not reliant on the nature of your data. For example, if your table has a primary key but doesn't have a reasonable cursor field for incremental syncing (i.e.
updated_at
), CDC allows you to sync your table incrementally.
Standard
Airbyte offers incremental replication using a custom cursor available in your source tables (e.g. updated_at
). We generally recommend against this replication method, but it is well suited for the following cases:
- Your MySQL server does not expose the binlog.
- Your data set is small, and you just want snapshot of your table in the destination.
Connecting with SSL or SSH Tunneling
SSL Modes
Airbyte Cloud uses SSL by default. You are not permitted to disable
SSL while using Airbyte Cloud.
Here is a breakdown of available SSL connection modes:
disable
to disable encrypted communication between Airbyte and the sourceallow
to enable encrypted communication only when required by the sourceprefer
to allow unencrypted communication only when the source doesn't support encryptionrequire
to always require encryption. Note: The connection will fail if the source doesn't support encryption.verify-ca
to always require encryption and verify that the source has a valid SSL certificateverify-full
to always require encryption and verify the identity of the source
Connection via SSH Tunnel
You can connect to a MySQL server via an SSH tunnel.
When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server (also called a bastion or a jump server) that has direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server.
To connect to a MySQL server via an SSH tunnel:
- While setting up the MySQL source connector, from the SSH tunnel dropdown, select:
- SSH Key Authentication to use a private as your secret for establishing the SSH tunnel
- Password Authentication to use a password as your secret for establishing the SSH Tunnel
- For SSH Tunnel Jump Server Host, enter the hostname or IP address for the intermediate (bastion) server that Airbyte will connect to.
- For SSH Connection Port, enter the port on the bastion server. The default port for SSH connections is 22.
- For SSH Login Username, enter the username to use when connecting to the bastion server. Note: This is the operating system username and not the MySQL username.
- For authentication:
- If you selected SSH Key Authentication, set the SSH Private Key to the private Key that you are using to create the SSH connection.
- If you selected Password Authentication, enter the password for the operating system user to connect to the bastion server. Note: This is the operating system password and not the MySQL password.
Generating a private key for SSH Tunneling
The connector expects an RSA key in PEM format. To generate this key:
ssh-keygen -t rsa -m PEM -f myuser_rsa
This produces the private key in pem format, and the public key remains in the standard format used by the authorized_keys
file on your bastion host. The public key should be added to your bastion host to whichever user you want to use with Airbyte. The private key is provided via copy-and-paste to the Airbyte connector configuration screen, so it may log in to the bastion.