Reading and Writing

Reading and writing AnnData objects.

Reading¶

Use anndata.io.read_h5ad to load a previously saved .h5ad file, or val.datasets.polis.load to import directly from a Polis conversation.

Writing¶

valency_anndata.write ¶

write(
    filename: Path | str,
    adata: AnnData,
    *,
    include: Sequence[str] | None = None,
    ext: Literal["h5", "csv", "txt", "npz"] | None = None,
    compression: Literal["gzip", "lzf"] | None = "gzip",
    compression_opts: int | None = None,
) -> None

Write an AnnData object to file with automatic sanitization.

Wraps scanpy.write but first copies and sanitizes adata so that problematic fields (mixed-type uns["statements"] columns, categorical kmeans_* labels with NA) do not cause serialization errors.

Parameters:

Name	Type	Description	Default
`filename`	`Path \| str`	Output path. If the filename has no file extension it is interpreted the same way as scanpy.write.	required
`adata`	`AnnData`	Annotated data matrix. Not mutated — a sanitized copy is written.	required
`include`	`Sequence[str] \| None`	When not `None`, only the listed `"namespace/key"` paths are kept in the written file. Glob patterns are supported (e.g. `"obsm/X_"`, `"obs/kmeans_"`). Valid namespaces are `obs`, `var`, `obsm`, `varm`, `layers`, `uns`, `obsp`, and `varp`. `X` and `raw` are always retained.	`None`
`ext`	`Literal['h5', 'csv', 'txt', 'npz'] \| None`	File extension from which to infer file format.	`None`
`compression`	`Literal['gzip', 'lzf'] \| None`	See `h5py dataset docs <https://docs.h5py.org/en/latest/high/dataset.html>`_.	`'gzip'`
`compression_opts`	`int \| None`	See `h5py dataset docs <https://docs.h5py.org/en/latest/high/dataset.html>`_.	`None`

Examples:

Basic — write everything:

val.write("conversation.h5ad", adata)

Advanced — selectively include keys with glob patterns:

val.write(
    "export.h5ad",
    adata,
    include=["obsm/X_pca", "obsm/X_pacmap", "obs/kmeans_*", "uns/*"],
)

Source code in src/valency_anndata/_write.py

def write(
    filename: Path | str,
    adata: AnnData,
    *,
    include: Sequence[str] | None = None,
    ext: Literal["h5", "csv", "txt", "npz"] | None = None,
    compression: Literal["gzip", "lzf"] | None = "gzip",
    compression_opts: int | None = None,
) -> None:
    """Write an [AnnData][anndata.AnnData] object to file with automatic sanitization.

    Wraps [scanpy.write][] but first copies and sanitizes `adata` so that
    problematic fields (mixed-type ``uns["statements"]`` columns, categorical
    ``kmeans_*`` labels with ``NA``) do not cause serialization errors.

    Parameters
    ----------
    filename
        Output path.  If the filename has no file extension it is interpreted
        the same way as [scanpy.write][].
    adata
        Annotated data matrix.  **Not** mutated — a sanitized copy is written.
    include
        When not ``None``, only the listed ``"namespace/key"`` paths are kept
        in the written file.  Glob patterns are supported
        (e.g. ``"obsm/X_*"``, ``"obs/kmeans_*"``).  Valid namespaces are
        ``obs``, ``var``, ``obsm``, ``varm``, ``layers``, ``uns``, ``obsp``,
        and ``varp``.  ``X`` and ``raw`` are always retained.
    ext
        File extension from which to infer file format.
    compression
        See `h5py dataset docs <https://docs.h5py.org/en/latest/high/dataset.html>`_.
    compression_opts
        See `h5py dataset docs <https://docs.h5py.org/en/latest/high/dataset.html>`_.

    Examples
    --------
    Basic — write everything:

    ```py
    val.write("conversation.h5ad", adata)
    ```

    Advanced — selectively include keys with glob patterns:

    ```py
    val.write(
        "export.h5ad",
        adata,
        include=["obsm/X_pca", "obsm/X_pacmap", "obs/kmeans_*", "uns/*"],
    )
    ```
    """
    sanitized = _sanitize_for_export(adata)
    if include is not None:
        _filter_adata(sanitized, include)
    sc.write(
        filename,
        sanitized,
        ext=ext,
        compression=compression,
        compression_opts=compression_opts,
    )