Use¶

To use pgcopy, instantiate a copy manager for your table, then pass it some data:

from datetime import datetime
from pgcopy import CopyManager
import psycopg2
cols = ('id', 'timestamp', 'location', 'temperature')
now = datetime.now()
records = [
        (0, now, 'Jerusalem', 72.2),
        (1, now, 'New York', 75.6),
        (2, now, 'Moscow', 54.3),
    ]
conn = psycopg2.connect(database='weather_db')
mgr = CopyManager(conn, 'measurements_table', cols)
mgr.copy(records)

# don't forget to commit!
conn.commit()

class pgcopy.CopyManager(conn, table, cols)¶

Facility for bulk-loading data using binary copy.

Inspects the database on instantiation for the column types.

Parameters

conn (psycopg2 connection) – a database connection
table (str) – the table name. Schema may be specified using dot notation: schema.table.
cols (list of str) – columns in the table into which to copy data

Raises

ValueError – if the table or columns do not exist.

copy(data[, fobject_factory])¶

Copy data into the database using a temporary file.

Parameters

data (iterable of iterables) – the data to be inserted
fobject_factory (function) – a tempfile factory

Data is serialized first in its entirety and then sent to the database. By default, a temporary file on disk is used. If you have enough memory, you can get a slight performance benefit with in-memory storage:

from io import BytesIO
mgr.copy(records, BytesIO)

For very large datasets, serialization can be done directly to the database connection using threading_copy().

In most circumstances, however, data transfer over the network and db processing take significantly more time than writing and reading a temporary file on a local disk.

ValueError is raised if a null value is provided for a column with non-null constraint.

threading_copy(data)¶

Copy data, serializing directly to the database.

Parameters: data (iterable of iterables) – the data to be inserted

pgcopy

Navigation

Related Topics

Use¶