view

Field

Bases: NamedTuple

Container class for storing the polars expression for a named field

assert_unique_acquisition_id

assert_unique_acquisition_id(run_info: LazyFrame, path: Path) -> None

Perform a check that the acquisition ids are unique raising AssertionError otherwise

format_view_table

format_view_table(
    lazyframe: LazyFrame, path: Path, selected_fields: Set[str]
) -> LazyFrame

Format the view table based on the selected fields

get_field_or_raise

get_field_or_raise(key: str) -> Field

Get the Field for this key or raise a KeyError

get_reads_tables

get_reads_tables(
    path: Path, selected_fields: Set[str], threshold: int = 100000
) -> Generator[LazyFrame, None, None]

Generate lazy dataframes from pod5 records. If the number of records is greater than threshold then yield chunks to limit memory consumption and improve overall performance

join_reads_to_run_info

join_reads_to_run_info(reads: LazyFrame, run_info: LazyFrame) -> LazyFrame

Join the reads and run_info tables

join_workers

join_workers(processes: List[SpawnProcess], exceptions: JoinableQueue) -> None

Poll workers checking for exceptions which will likely cause

parse_read_table_chunks

parse_read_table_chunks(
    reader: Reader, approx_size: int = 99999
) -> Generator[LazyFrame, None, None]

Read record batches and yield polars lazyframes of approx_size records. Records are yielded in units of whole batches of the underlying table

parse_reads_table_all

parse_reads_table_all(reader: Reader) -> LazyFrame

Parse all records in the reads table returning a polars LazyFrame

parse_reads_table_batch

parse_reads_table_batch(reader: Reader, batch_index: int) -> Tuple[LazyFrame, int]

Parse the reads table record batch at batch_index from a pod5 file returning a polars LazyFrame and the number of records in it

parse_run_info_table

parse_run_info_table(reader: Reader) -> LazyFrame

Parse the reads table from a pod5 file returning a polars LazyFrame

print_fields

print_fields()

Print a list of the available columns

resolve_output

resolve_output(output: Optional[Path], force_overwrite: bool) -> Optional[Path]

Resolve the output path if necessary checking for no accidental overwrite and resolving to default output if given a path

select_fields

select_fields(
    *,
    group_read_id: bool = False,
    include: Optional[str] = None,
    exclude: Optional[str] = None
) -> Set[str]

Select fields to write

view_pod5

view_pod5(
    inputs: List[Path],
    output: Path,
    separator: str = "\t",
    recursive: bool = False,
    force_overwrite: bool = False,
    list_fields: bool = False,
    no_header: bool = False,
    threads: int = DEFAULT_THREADS,
    **kwargs
) -> None

Given a list of POD5 files write a table to view their contents

worker_process

worker_process(
    paths: JoinableQueue,
    exceptions: JoinableQueue,
    lock: Lock,
    output: Path,
    separator: bool,
    selection: Set[str],
) -> None

Consume pod5 paths from paths queue, parse the records and write to output after acquiring lock. Returns None when all finish sentinel None is received in paths queue.

write

write(ldf: LazyFrame, output: Optional[Path], separator: str = '\t') -> None

Write the polars.LazyFrame

write_header

write_header(output: Optional[Path], selected: Set[str], separator: str = '\t') -> None

Write the header line