We typically see five primary methods for getting data into Timescale, depending on your needs and use case:

Bulk Upload and Migration

  1. Importing data from .csv 
  2. Migrating from an existing postgreSQL database 
  3. Migrating from another time-series database like InfluxDB  

Inserting Data

   1. Client drivers such as JDBC, Python, and Node.js)
   2. Message queues, specifically Kafka

NOTE: Before you get started, first make sure that you have installed your Timescale cloud instance and connected to a sql client like psql, pgAdmin, etc..

1. Importing data from .csv

If you have a dataset stored in a .csv file, you can import it into an empty TimescaleDB table or a “hypertable”. A hypertable is a Timescale-specific abstraction of a single continuous table across all space and time partitions, but it can still be queried with standard SQL.

1.1 Define the schema for your new table

We need to define the schema for a new table before we insert data into it. Creating this empty table requires knowledge of the schema of the data in the file, but is otherwise the same as creating any new table. Our example is a database named new_db and a data file named old_db.csv. 

TIP: We’ve created a quick tool to automatically create your postgreSQL or timescale schema, found here on pgschema.com

First, you’ll need to use psql (or your preferred client/UI) to connect to the database. If you choose to use psql, connecting to Timescale Cloud is as simple as running:

sql ‘<service_url>’

You can find your service URL from within the cloud portal, on the “Overview” tab.

 
Next, let’s create a database in which we will store our new table.

CREATE DATABASE new_db;
\c new_db; 

This allows us to connect to the new database we just created. 

Finally, we create a new table with a defined schema that maps to our CSV file. 

-- Assuming the data file's columns are time, location, temperature

CREATE TABLE conditions (
    time        TIMESTAMPTZ         NOT NULL,
    location    text                NOT NULL,
    temperature DOUBLE PRECISION    NULL
);

1.2 Convert the new table into a hypertable

Now that we have a SQL table, we can (optionally) convert it into a hypertable for better performance. 

In Timescale Cloud, any new databases you create need to have the TimescaleDB extension loaded in before you can create a new hypertable.

CREATE EXTENSION timescaledb;
\dx

this command lists the extensions currently available in your database. You should see timescaledb listed there.

 Then. convert the table you created earlier into a hypertable using the function “create_hypertable

1.3. Inserting data into the hypertable

Recommended: Using  timescaledb-parallel-copy 

To bulk insert data into the new table, we recommend using our open sourced Go program that can speed up large data migrations by running multiple COPY’s concurrently. If you have TimescaleDB installed locally, your package manager should have installed timescaledb-parallel-copy. If not, you can go to our GitHub to install it separately. 

 Here’s an example of how to use timescaledb-parallel-copy with 4 workers: 

timescaledb-parallel-copy 
--connection '<service_url>’'
--table taxi_data
--file ~/Downloads/jan_1.csv
--workers 4
--copy-options "CSV"
--skip-header

Again, the service URL can be found in the cloud portal UI and should look something like this: 

postgres://tsdbadmin:[email protected]:20457/defaultdb?sslmode=require

In addition to parallelizing the workload, the tool also offers flags to improve the copy experience. See the repo on GitHub for full details.

TIP: We recommend to not set the number of workers higher than the number of available CPU cores on either your client machine or server. Above that, the workers tend to compete with each other for resources and reduce the performance improvements.

Using PostgreSQL's COPY

Although we recommend our open sourced Go program for better bulk insert performance, we can also use PostgreSQL's bulk insert command COPY to copy data from the .csv into our new db:

psql '<service_url>/new_db?sslmode=require' -c "\copy conditions FROM old.csv WITH (FORMAT CSV, HEADER)"

 This method is straightforward and requires no extra tools, but for large datasets it can be impractical and time-consuming because COPY is single-threaded. For a faster method that can utilize more of the CPU, use the previous method.

2. Client Drivers: JDBC, Python, and Node.js 


2.1 Python / psycopg

2.1.1 Installing and connecting psycopg

Before we begin, you should ensure that you have the psycopg2 library installed, which is a popular adapter for postgresql. If not, you can run the following in your terminal:

pip install psycopg2

In our examples, we will be using Python version 3.6 on a Mac or Linux OS. If you are running and earlier version or are on a Windows machine, the commands should still be similar. With everything set up, let’s dive into connecting to your cloud database.

from psycopg2.extras 
import RealDictCursor
import psycopg2
url = <service_url>
db_conn = psycopg2.connect(uri)
c = db_conn.cursor(cursor_factory=RealDictCursor)
c.execute("SELECT 1 = 1")
result = c.fetchone()

Again, the service URL can be found in the cloud portal UI and should look something like this: 

postgres://tsdbadmin:[email protected]:20457/defaultdb?sslmode=require

To create a table that fits your dataset, we suggest using psql or your preferred sql client to first define the schema. After the schema is defined, you can use python to query it.

TIP: We’ve created a quick tool to automatically create your postgreSQL or timescale schema, found here on pgschema.com

2.1.2 Inserting data

With our table created, it’s time to load data into the database.

A common way of loading data into a Postgres table is to issue an INSERT command on the table. The insert command requires a table name to insert to and the sequence of values to insert. 

Many of our insert examples are written in Python. You can view them at our Github. https://github.com/timescale/examples.

One specific example using time series data is a sample dataset on air quality data:
https://github.com/timescale/examples/blob/master/air-quality/airquality_ingest.py

While INSERT statements accomplishes the task of loading in our data, it’s actually not the most efficient way of doing it. As you can see, we had to loop through every single row from the file just to insert them into the database.  Luckily for us, Postgres has a command specific for loading files into tables.

Recommended: using the COPY command

The Postgres command to load files directly into tables is called COPY. It takes in a file (like a CSV) and automatically loads the file into a Postgres table. Instead of creating the query and then running it through execute() like INSERT, psycopg2, has a method written solely for this query.

The method to load a file into a table is called copy_from. Like the execute() method, it is attached to the Cursor object. However, it differs quite a bit from the execute() method due to its parameters.

The copy_from arguments requires a file to load (without the header), the tablename it should load into, as well as a delimiter (the key argument sep). Then, running commit(), the file is transferred into ths is the most efficient, and recommended, way to load CSV files into a Postgres table.

This is how we use copy_from() to load our file instead of looping INSERT commands:

import psycopg2
conn = psycopg2.connect("host=<service_url> dbname=postgres user=postgres")
cur = conn.cursor()

with open('user_accounts.csv', 'r') as f:
# Notice that we don't need the csv module.
next(f) # Skip the header row.

cur.copy_from(f, 'users', sep=',')
conn.commit()

 

2.2. JDBC Driver

Two prerequisites you will need:

  • Install the Maven dependency for PostgreSQL JDBC driver (You can also download the latest driver from postgresql.org):
<dependency>
      <groupId>org.postgresql</groupId>
      <artifactId>postgresql</artifactId>
      <version>42.2.0</version>
    </dependency>
  • Find your service CA certificate, which you can copy from the Timescale Cloud console's service page by clicking the "view CA certificate" button.

From there, input the following:

package pg;

import java.sql.Connection;
import java.sql.DriverManager;
import java.util.Properties;

public final class Connect {
  public static void main(String[] args) {
    Properties props = new Properties();
    props.put("jdbc.url", "jdbc:postgresql://pg-3b8d4ed6-myfirstcloudhub.aivencloud.com:20985/defaultdb");
    props.put("user", "avnadmin");
    props.put("password", "nr0dfnswz36xs9pi");
    props.put("ssl", "true");
    props.put("sslmode", "verify-ca");
    props.put("sslrootcert", "/path/to/ca.pem");

    try {
      Connection c = DriverManager.getConnection(props.getProperty("jdbc.url"), props);
      System.out.println("Success");
      c.close();
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}
 


2.3. ODBC Driver

Check out this link to psqlodc for the official ODBC driver for postgresql.
For reference, you will also need your service URL from within the cloud portal, on the “Overview” tab.

Additionally, if you need your service CA certificate, you can copy from the Timescale Cloud console's service page by clicking the "view CA certificate" button.

2.4 Node.js

TIP: We recommend using the the node-postgres library, or you could also use the Sequelize library.

2.4.1 Node.js using node-postgres

This example uses the node-postgres library. This is the simplest possible way to connect, query, and disconnect with async/await:

const { Client } = require('pg')
const client = new Client()
await client.connect()
const res = await client.query('SELECT $1::text as message', ['Hello world!'])
console.log(res.rows[0].message) // Hello world!
await client.end()

And here's the same thing with callbacks:

const { Client } = require('pg')
const client = new Client()
client.connect()
client.query('SELECT $1::text as message', ['Hello world!'], (err, res) => {
 console.log(err ? err.stack : res.rows[0].message) // Hello World!
 client.end()
})

2.4.2 Node.js using Sequelize

This approach uses the Sequelize library. Two prerequisites you will need:

  • Install the pg-native package.
  • Find your service CA certificate, which you can copy from the Timescale Cloud console's service page by clicking the "view CA certificate" button.

From there, input the following:

const Sequelize = require('sequelize');
const database = 'defaultdb';
const username = 'tsdbadmin';
const password = 'nr0dfnswz36xs9pi';
const host = 'pg-3b8d4ed6-myfirstcloudhub.timescaledb.io';
const port = 20985;
const sslrootcert = '/path/to/ca.pem';

const sequelize = new Sequelize(database, username, password, {
  host,
  port,
  dialect: 'postgres',
  native: true,
  ssl: true,
  dialectOptions: {
    ssl: {
      sslmode: 'verify-ca',
      sslrootcert
    }
  },
});

sequelize.query('SELECT 1 AS value')
  .then(([results, metadata]) => console.log(results));


3. Message queues, specifically Kafka

The examples below are useful for inserting data into Timescale from a message queue, not for monitoring your Timescale instance using a message queue. For information on how to send from Timescale to a message queue, out this link.

3.1 PostgreSQL’s Kafka Connector

One popular method of ingesting data into TimescaleDB is through the use of the PostgreSQL connector with Kafka Connect. The connector is designed to work with Kafka Connect and to be deployed to a Kafka Connect runtime service. It’s purpose is to ingest change events from PostgreSQL databases (i.e. TimescaleDB).

The deployed connector will monitor one or more schemas within a TimescaleDB server and write all change events to Kafka topics, which can be independently consumed by one or more clients. Kafka Connect can be distributed to provide fault tolerance to ensure the connectors are running and continually keeping up with changes in the database.

TIP: The PostgreSQL connector can also be used as a library without Kafka or Kafka Connect, enabling applications and services to directly connect to TimescaleDB and obtain the ordered change events. This approach requires the application to record the progress of the connector so that upon restart, the connect can continue where it left off. This approach may be useful for less critical use cases. However, for production use cases, it’s recommended that you use this connector with Kafka and Kafka Connect.

To start using the PostgreSQL connector, visit the GitHub page. If you are interested in an alternative method to ingest data from Kafka to TimescaleDB, you can download the StreamSets Data Collector and get started with this tutorial.

4. Migrating from an existing database instance to Timescale Cloud

Depending on where your data is currently stored, the steps to migrate it to Timescale cloud are slightly different:

  1. Same database: If you want to setup Timescale in the same database and same PostgreSQL instance as your stored data, follow these instructions.
  2. Different database: If you want to migrate data from a different database or a different PostgreSQL instance altogether, follow these instructions. This process uses the popular function pg_dump. 

One prerequisite you will need for either step is to set up and connect your preferred SQL client such as psql, pgAdmin, DBeaver, etc. If you choose to use psql, connecting to Timescale Cloud is as simple as running:

sql ‘<service_url>’

You can find your service URL from within the cloud portal, on the “Overview” tab.

5. Migrating from InfluxDB

If you want to migrate data from InfluxDB, follow these instructions to use our tool called "Outflux".

Outflux is an open-source tool that users can use to batch migrate data from InfluxDB to TimescaleDB. Anyone who is currently running an InfluxDB instance can migrate their workload to TimescaleDB with a single command: outflux migrate. You must also have TimescaleDB installed and a means to connect to it. With Outflux, users can pipe exported data directly into TimescaleDB. Outflux manages schema discovery, validation, and creation.

Did this answer your question?