Datasets
Reference Datasets¶
These datasets are provided as a starting point for exploration and experimentation.
valency_anndata.datasets.aufstehen ¶
Polis conversation of 33k+ Germans, run by political party Aufstehen.
This is largest Polis conversation run as of now, in fall 2018.
See: https://compdemocracy.org/Case-studies/2018-germany-aufstehen/
The data is pulled from an archive at: https://huggingface.co/datasets/patcon/polis-aufstehen-2018
Note
This dataset has been augmented by merging is-meta and is-seed statement
data (missing from the official CSV export) that were retreived from the
Polis API. Specifically, is-meta is required in order to reproduce outputs
of the Polis data pipeline.
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project. The data and more information about how the data was collected can be found at the following link: https://pol.is/report/r6xd526vyjyjrj9navxrj
Source code in src/valency_anndata/datasets/_load_aufstehen.py
valency_anndata.datasets.chile_protest ¶
Polis conversation of 2,700+ Chileans during the 2019 #ChileDesperto protests.
It was run informally by a single citizen, with minimal support infrastructure, outreach strategy, or moderation process.
See: https://en.wikipedia.org/wiki/Social_Outburst_(Chile)
Note
This dataset has been augmented by merging is-meta and is-seed statement
data (missing from the official CSV export) that were retreived from the
Polis API. Specifically, is-meta is required in order to reproduce outputs
of the Polis data pipeline.
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project. The data and more information about how the data was collected can be found at the following link: https://pol.is/report/r29kkytnipymd3exbynkd
Source code in src/valency_anndata/datasets/_load_chile_protest.py
Polis¶
valency_anndata.datasets.polis.load ¶
Load a Polis conversation or report into an AnnData object.
This function accepts either a URL or an ID for a Polis conversation or report,
fetches raw vote events and statements via the Polis API or CSV export, and
optionally constructs a participant × statement vote matrix in adata.X.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
The Polis source to load. Supported formats include:
The function will automatically parse the source to determine whether it refers to a conversation or report and fetch the appropriate data. |
required |
translate_to
|
str or None
|
Target language code (e.g., "en", "fr", "es") for translating statement text.
If provided, the original statement text in |
None
|
build_X
|
bool
|
If True, constructs a participant × statement vote matrix from the raw
votes using |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
adata |
AnnData
|
An AnnData object containing the loaded Polis data. |
DataFrame
|
|
|
dict
|
|
|
DataFrame
|
|
|
dict
|
|
|
dict
|
|
|
dict
|
|
|
ndarray
|
|
|
DataFrame
|
|
|
DataFrame
|
|
|
AnnData
|
|
Notes
- If
build_X=False, onlyadata.unswill be populated, containing the raw votes and statements, and.X,.obs,.var, and.rawwill remain empty. adata.rawis assigned only after the first vote matrix build and is intended to be immutable.- If
translate_tois provided,adata.var["content"]is updated with translated text andadata.var["language_current"]is set to the target language. - The vote matrix is derived from the most recent votes per participant per statement, sorted by timestamp.
Examples:
Load data from a report or conversation ID or URL.
adata = val.datasets.polis.load("https://pol.is/report/r2dfw8eambusb8buvecjt")
adata = val.datasets.polis.load("6rphtwwfn4")
Load data from an alternative Polis instance via URL.
Load data from a path containing Polis CSV export files.
Source code in src/valency_anndata/datasets/polis.py
| |
valency_anndata.datasets.polis.translate_statements ¶
translate_statements(
adata: AnnData,
translate_to: Optional[str],
inplace: bool = True,
) -> Optional[list[str]]
Translate statements in adata.uns['statements']['comment-body'] into another language,
or copy originals if translate_to is None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
AnnData object containing |
required |
translate_to
|
Optional[str]
|
Target language code (e.g., "en", "fr", "es"). |
required |
inplace
|
bool
|
If True, updates |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
translated_texts |
list[str] | None
|
List of translated texts if |