Skip to content

POD5 API Reader Reference

Tools for accessing POD5 data from PyArrow files

ArrowTableHandle

ArrowTableHandle(location: EmbeddedFileData, options: Optional[IpcReadOptions] = None)

Class for managing arrow file handles and memory view mapping of tables

Open a pod5 file at the given path and use the location data to load an arrow table (e.g. signal table)

Parameters:

Name Type Description Default
location EmbeddedFileData

Location data for how a pod5 file should be spit in memory to read a table. This is returned from p5b.Pod5FileReader.get_file_X_location methods

required
options Optional[IpcReadOptions]

Serialization options for reading IPC format.

None

Raises:

Type Description
Pod5ApiException

If handle could not be opened

reader property

reader: RecordBatchFileReader

Return the pyarrow file reader object

stream property

stream: Union[PythonFile, NativeFile]

Return the pyarrow file stream / backend

close

close() -> None

Cleanly close the open file handles and memory views.

ReadRecord

ReadRecord(
    reader: Reader,
    batch: ReadRecordBatch,
    row: int,
    batch_signal_cache: Optional[List[NDArray[int16]]] = None,
    selected_batch_index: Optional[int] = None,
)

Represents the data for a single read from a pod5 record.

byte_count property

byte_count: int

Get the number of bytes used to store the reads data.

calibration property

calibration: Calibration

Get the calibration data associated with the read.

calibration_digitisation property

calibration_digitisation: int

Get the digitisation value used by the sequencer.

Intended to assist workflows ported from legacy file formats.

calibration_range property

calibration_range: float

Get the calibration range value.

Intended to assist workflows ported from legacy file formats.

end_reason property

end_reason: EndReason

Get the end reason data associated with the read.

end_reason_index property

end_reason_index: int

Get the dictionary index of the end reason data associated with the read. This property is the same as the EndReason enumeration value.

has_cached_signal property

has_cached_signal: bool

Get if cached signal is available for this read.

median_before property

median_before: float

Get the median before level (in pico amps) for the read.

num_minknow_events property

num_minknow_events: int

Find the number of minknow events in the read.

num_reads_since_mux_change property

num_reads_since_mux_change: int

Number of selected reads since the last mux change on this reads channel.

num_samples property

num_samples: int

Get the number of samples in the reads signal data.

open_pore_level property

open_pore_level: float

Get the open pore level for the read.

This is a float value representing the open pore level of the well prior to the read starting.

pore property

pore: Pore

Get the pore data associated with the read.

predicted_scaling property

predicted_scaling: ShiftScalePair

Find the predicted scaling value in the read.

read_id property

read_id: UUID

Get the unique read identifier for the read as a UUID.

read_number property

read_number: int

Get the integer read number of the read.

run_info property

run_info: RunInfo

Get the run info data associated with the read.

run_info_index property

run_info_index: int

Get the dictionary index of the run info data associated with the read.

sample_count property

sample_count: int

Get the number of samples in the reads signal data.

signal property

signal: NDArray[int16]

Get the full signal for the read.

Returns:

Type Description
ndarray[int16]

A numpy array of signal data with int16 type.

signal_pa property

signal_pa: NDArray[float32]

Get the full signal for the read, calibrated in pico amps.

Returns:

Type Description
ndarray[float32]

A numpy array of signal data in pico amps with float32 type.

signal_rows property

signal_rows: List[SignalRowInfo]

Get all signal rows for the read

Returns:

Type Description
list[SignalRowInfo]

A list of signal row data (as SignalRowInfo) in the read.

start_sample property

start_sample: int

Get the absolute sample which the read started.

time_since_mux_change property

time_since_mux_change: float

Time in seconds since the last mux change on this reads channel.

tracked_scaling property

tracked_scaling: ShiftScalePair

Find the tracked scaling value in the read.

calibrate_signal_array

calibrate_signal_array(signal_array_adc: NDArray[int16]) -> NDArray[float32]

Transform an array of int16 signal data from ADC space to pA.

Returns:

Type Description
A numpy array of signal data with float32 type.

signal_for_chunk

signal_for_chunk(index: int) -> NDArray[int16]

Get the signal for a given chunk of the read.

Returns:

Type Description
ndarray[int16]

A numpy array of signal data with int16 type for the specified chunk.

to_read

to_read() -> Read

Create a mutable Read from this ReadRecord instance.

Returns:

Type Description
Read

ReadRecordBatch

ReadRecordBatch(reader: Reader, batch: RecordBatch)

Read data for a batch of reads.

cached_sample_count_column property

cached_sample_count_column: NDArray[uint64]

Get the sample counts from the cached signal data

cached_samples_column property

cached_samples_column: List[NDArray[int16]]

Get the samples column from the cached signal data

columns property

columns: ReadRecordV4Columns

Return the data from this batch as a ReadRecordColumns instance

num_reads property

num_reads: int

Return the number of rows in this RecordBatch

read_id_column property

read_id_column: FixedSizeBinaryArray

Get the column of read ids for this batch

read_number_column property

read_number_column: UInt32Array

Get the column of read numbers for this batch

get_read

get_read(row: int) -> ReadRecord

Get the ReadRecord at row index

reads

reads() -> Generator[ReadRecord, None, None]

Iterate all reads in this batch.

Yields:

Type Description
ReadRecord

ReadRecord instances in the file.

set_cached_signal

set_cached_signal(signal_cache: Pod5SignalCacheBatch) -> None

Set the signal cache

set_selected_batch_rows

set_selected_batch_rows(selected_batch_rows: Iterable[int]) -> None

Set the selected batch rows

Reader

Reader(path: PathOrStr)

The base reader for POD5 data

Open a pod5 filepath for reading

batch_count property

batch_count: int

Find the number of read batches available in the file.

file_version property

file_version: Version

The version of pod5 that originally generated this file, this is not updated when updating the file.

file_version_pre_migration property

file_version_pre_migration: Version

The version of pod5 that is stored with the file on disk.

inner_file_reader property

inner_file_reader: Pod5FileReader

Access the inner c_api Pod5FileReader - use with caution

is_vbz_compressed property

is_vbz_compressed: bool

Return if this file's signal is compressed

num_reads property

num_reads: int

Find the number of reads in the file.

path property

path: Path

Return the path to this pod5 file

read_ids property

read_ids: List[str]

Return all read_ids as a list of strings.

For the most performant implementation consider Reader.read_ids_raw

read_ids_raw property

read_ids_raw: ChunkedArray

Return chunked arrow array of read ids.

To get read ids as string use Reader.read_ids

read_table property

read_table: RecordBatchFileReader

Access the pod5 read table

run_info_table property

run_info_table: RecordBatchFileReader

Access the pod5 run_info table

signal_batch_row_count property

signal_batch_row_count: int

Return signal batch row count

signal_table property

signal_table: RecordBatchFileReader

Access the pod5 signal table - use with caution

__iter__

__iter__() -> Generator[ReadRecord, None, None]

Iterate over all reads

close

close() -> None

Close files handles

get_batch

get_batch(index: int) -> ReadRecordBatch

Get a read batch in the file.

Returns:

Type Description
ReadRecordBatch

The requested batch as a ReadRecordBatch.

read_batches

read_batches(
    selection: Optional[List[str]] = None,
    batch_selection: Optional[Iterable[int]] = None,
    missing_ok: bool = False,
    preload: Optional[Set[str]] = None,
) -> Generator[ReadRecordBatch, None, None]

Iterate batches in the file, optionally selecting certain rows.

Parameters:

Name Type Description Default
selection iterable[str]

The read ids to walk in the file.

None
batch_selection iterable[int]

The read batches to walk in the file.

None
missing_ok bool

If selection contains entries not found in the file, an error will be raised.

False
preload set[str]

Columns to preload - "samples" and "sample_count" are valid values

None

Returns:

Type Description
Generator[ReadRecordBatch, None, None]

A generator yielding ReadRecordBatchs

reads

reads(
    selection: Optional[Iterable[str]] = None,
    missing_ok: bool = False,
    preload: Optional[Set[str]] = None,
) -> Generator[ReadRecord, None, None]

Iterate reads in the file, optionally filtering for certain read ids.

Parameters:

Name Type Description Default
selection iterable[str]

The read ids to walk in the file.

None
missing_ok bool

If selection contains entries not found in the file, an error will be raised.

False
preload set[str]

Columns to preload - "samples" and "sample_count" are valid values

None

Returns:

Type Description
Generator[ReadRecord, None, None]

A generator yielding ReadRecords