Datasets
Data Overview¶
| Dataset | Participants1 | Statements2 | Completeness3 | Fingerprint |
|---|---|---|---|---|
| vTaiwan: Uber (Jun 2015) |
1,324 / 2,229 | 101 / 199 | 54% / 52% / 46% / 38% (26 / 51 / 76 / 101) |
![]() |
| vTaiwan: Airbnb (Aug 2015) |
864 / 1,675 | 237 / 245 | 26% / 18% / 14% / 11% (60 / 119 / 178 / 237) |
![]() |
| vTaiwan: Online Alcohol Sales (Mar 2016) |
374 / 639 | 72 / 72 | 47% / 42% / 38% / 32% (18 / 36 / 54 / 72) |
![]() |
| vTaiwan: Caning (Nov 2017) |
1,852 / 2,194 | 340 / 602 | 21% / 14% / 10% / 12% (85 / 170 / 255 / 340) |
![]() |
| American Assembly: Bowling Green (Feb 2018) |
1,583 / 2,044 | 633 / 896 | 39% / 34% / 27% / 22% (159 / 317 / 475 / 633) |
![]() |
| Aufstehen (Sep 2018) |
23,354 / 33,422 | 161 / 783 | 62% / 58% / 53% / 52% (41 / 81 / 121 / 161) |
![]() |
| American Assembly: Louisville (Mar 2019) |
1,163 / 1,398 | 603 / 877 | 27% / 25% / 21% / 18% (151 / 302 / 453 / 603) |
![]() |
| Chile Protests (Nov 2019) |
1,743 / 2,739 | 399 / 1045 | 15% / 13% / 10% / 8% (100 / 200 / 300 / 399) |
![]() |
| Cuba 15N: Before (1) (Oct 2021) |
243 / 277 | 122 / 123 | 80% / 73% / 60% / 51% (31 / 61 / 92 / 122) |
![]() |
| Cuba 15N: Before (2) (Nov 2021) |
1,276 / 1,712 | 413 / 1018 | 37% / 30% / 24% / 19% (104 / 207 / 310 / 413) |
![]() |
| Cuba 15N: After (Nov 2021) |
308 / 478 | 325 / 340 | 38% / 31% / 23% / 18% (82 / 163 / 244 / 325) |
![]() |
| Klimarat: Food & Land Use (Apr 2022) |
2,968 / 3,616 | 862 / 1452 | 26% / 17% / 13% / 10% (216 / 431 / 647 / 862) |
![]() |
| Klimarat: Mobility (Apr 2022) |
2,660 / 3,142 | 1064 / 2138 | 23% / 17% / 13% / 10% (266 / 532 / 798 / 1064) |
![]() |
| Klimarat: Energy (Apr 2022) |
1,443 / 1,765 | 625 / 1040 | 28% / 21% / 17% / 14% (157 / 313 / 469 / 625) |
![]() |
| Klimarat: Housing (Apr 2022) |
1,261 / 1,503 | 369 / 611 | 34% / 26% / 21% / 17% (93 / 185 / 277 / 369) |
![]() |
| Klimarat: Production & Consumption (Apr 2022) |
900 / 1,116 | 337 / 522 | 38% / 30% / 24% / 20% (85 / 169 / 253 / 337) |
![]() |
| BG 2050 (Feb 2025) |
6,609 / 7,890 | 3983 / 7730 | 12% / 7% / 5% / 4% (996 / 1992 / 2988 / 3983) |
![]() |
| Japan Choice: Foreign Affairs & Security (2025) (Jul 2025) |
4,016 / 4,616 | 20 / 20 | 98% / 98% / 98% / 98% (5 / 10 / 15 / 20) |
![]() |
| Japan Choice: Diversity & Human Rights (2025) (Jul 2025) |
4,001 / 4,354 | 8 / 8 | 100% / 100% / 100% / 100% (2 / 4 / 6 / 8) |
![]() |
| Japan Choice: Education, Children & Old Age Care (2025) (Jul 2025) |
4,285 / 4,723 | 13 / 13 | 99% / 99% / 99% / 99% (4 / 7 / 10 / 13) |
![]() |
| Japan Choice: Economy, Taxation & Employment (2025) (Jul 2025) |
10,560 / 12,846 | 18 / 18 | 98% / 98% / 98% / 98% (5 / 9 / 14 / 18) |
![]() |
| Japan Choice: Foreign Affairs & Security (2026) (Jan 2026) |
1,653 / 2,140 | 19 / 19 | 100% / 100% / 99% / 99% (5 / 10 / 15 / 19) |
![]() |
| Japan Choice: Diversity & Human Rights (2026) (Jan 2026) |
1,546 / 1,833 | 9 / 9 | 100% / 100% / 100% / 100% (3 / 5 / 7 / 9) |
![]() |
| Japan Choice: Education, Children & Old Age Care (2026) (Jan 2026) |
1,730 / 1,985 | 8 / 8 | 100% / 100% / 100% / 100% (2 / 4 / 6 / 8) |
![]() |
| Japan Choice: Economy, Taxation & Employment (2026) (Jan 2026) |
3,392 / 4,526 | 20 / 20 | 100% / 99% / 98% / 97% (5 / 10 / 15 / 20) |
![]() |
Reference Datasets¶
These datasets are provided as a starting point for exploration and experimentation.
valency_anndata.datasets.aufstehen ¶
Polis conversation of 33k+ Germans, run by political party Aufstehen.
This is largest Polis conversation run as of now, in fall 2018.
See: https://compdemocracy.org/Case-studies/2018-germany-aufstehen/
The data is pulled from an archive at: https://huggingface.co/datasets/patcon/polis-aufstehen-2018
Note
This dataset has been augmented by merging is-meta and is-seed statement
data (missing from the official CSV export) that were retreived from the
Polis API. Specifically, is-meta is required in order to reproduce outputs
of the Polis data pipeline.
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project. The data and more information about how the data was collected can be found at the following link: https://pol.is/report/r6xd526vyjyjrj9navxrj
Source code in src/valency_anndata/datasets/_load_aufstehen.py
valency_anndata.datasets.american_assembly ¶
american_assembly(
city: AmericanAssemblyCity | str,
translate_to: Optional[str] = None,
**kwargs,
)
Polis conversations run by the American Assembly in Kentucky cities.
The American Assembly is a public affairs organization that has used Polis to facilitate civic dialogue. These conversations were run in Bowling Green and Louisville, Kentucky.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
city
|
str
|
The city conversation to load. One of:
|
required |
translate_to
|
str or None
|
Target language code (e.g., |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
adata |
AnnData
|
AnnData object containing the loaded Polis conversation. |
Examples:
Load the Bowling Green conversation:
Load the Louisville conversation translated to French:
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project.
Source code in src/valency_anndata/datasets/_load_american_assembly.py
valency_anndata.datasets.bg2050 ¶
Polis conversation from the BG 2050 community visioning project.
A 33-day digital engagement where nearly 7,900 residents of Bowling Green and Warren County, Kentucky, shared ideas for the region's future. The project was commissioned by Warren County government in response to projections that the county will nearly double in size over 25 years, and was executed by Innovation Engine in partnership with The Computational Democracy Project and Google's Jigsaw.
See: https://whatcouldbgbe.com/about-the-project
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
translate_to
|
str or None
|
Target language code (e.g., |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
adata |
AnnData
|
AnnData object containing the loaded Polis conversation. |
Examples:
Load the BG 2050 conversation:
Load translated to French:
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project. The data and more information about how the data was collected can be found at the following link: https://pol.is/report/r7wehfsmutrwndviddnii
Source code in src/valency_anndata/datasets/_load_bg2050.py
valency_anndata.datasets.chile_protest ¶
Polis conversation of 2,700+ Chileans during the 2019 #ChileDesperto protests.
It was run informally by a single citizen, with minimal support infrastructure, outreach strategy, or moderation process.
See: https://en.wikipedia.org/wiki/Social_Outburst_(Chile)
Note
This dataset has been augmented by merging is-meta and is-seed statement
data (missing from the official CSV export) that were retreived from the
Polis API. Specifically, is-meta is required in order to reproduce outputs
of the Polis data pipeline.
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project. The data and more information about how the data was collected can be found at the following link: https://pol.is/report/r29kkytnipymd3exbynkd
Source code in src/valency_anndata/datasets/_load_chile_protest.py
valency_anndata.datasets.cuba_protest ¶
Polis conversations run around Cuba's planned 15N march (November 2021).
The 15N march was a peaceful protest planned for November 15, 2021, but was suppressed by the Cuban government before it could take place. Three conversations were run in sequence — two before the planned march and one after its suppression — allowing longitudinal comparison of public opinion around the event.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
period
|
str
|
The conversation period to load. One of:
|
required |
translate_to
|
str or None
|
Target language code (e.g., |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
adata |
AnnData
|
AnnData object containing the loaded Polis conversation. |
Examples:
Load the post-protest conversation:
Load the first pre-protest conversation with English translation:
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project.
Source code in src/valency_anndata/datasets/_load_cuba_protest.py
valency_anndata.datasets.japanchoice ¶
Polis conversations from Japan Choice, a Japanese civic engagement platform.
Japan Choice runs Polis conversations on key policy topics ahead of Japanese elections, allowing citizens to share and compare their views on national issues. Conversations are in Japanese.
See: https://japanchoice.jp/polis
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topic
|
str
|
The policy topic and year to load. One of:
|
required |
translate_to
|
str or None
|
Target language code (e.g., |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
adata |
AnnData
|
AnnData object containing the loaded Polis conversation. |
Examples:
Load the 2025 Economy, Taxation & Employment conversation:
Load the 2026 Foreign Affairs & Security conversation translated to English:
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project.
Source code in src/valency_anndata/datasets/_load_japanchoice.py
valency_anndata.datasets.klimarat ¶
Polis conversations from Austria's Citizens' Climate Council (Klimarat).
The Klimarat der Bürgerinnen und Bürger was Austria's national citizens' assembly on climate policy, convened in 2021–2022. Polis conversations were run for each of five topic areas to gather public input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topic
|
str
|
The topic area to load. One of:
|
required |
translate_to
|
str or None
|
Target language code (e.g., |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
adata |
AnnData
|
AnnData object containing the loaded Polis conversation. |
Examples:
Load the Energy topic conversation:
Load the Food & Land Use topic conversation with English translation:
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project.
Source code in src/valency_anndata/datasets/_load_klimarat.py
valency_anndata.datasets.vtaiwan ¶
Polis conversations from the vTaiwan collaborative policymaking process.
vTaiwan is a civic deliberation process initiated in 2014 by the g0v community in Taiwan, using Polis to gather citizen perspectives on digital governance and social policy issues. These conversations are in Traditional Chinese.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
topic
|
str
|
The policy topic to load. One of:
|
required |
translate_to
|
str or None
|
Target language code (e.g., |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
adata |
AnnData
|
AnnData object containing the loaded Polis conversation. |
Examples:
Load the Uber conversation:
Load the Airbnb conversation translated to English:
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project.
Source code in src/valency_anndata/datasets/_load_vtaiwan.py
Polis¶
valency_anndata.datasets.polis.load ¶
load(
source: str,
*,
translate_to: Optional[str] = None,
build_X: bool = True,
trim_rule: int | float | str = 1.0,
skip_cache: bool = False,
show_progress: bool = True,
include_precomputed_groups: bool = False,
) -> AnnData
Load a Polis conversation or report into an AnnData object.
This function accepts either a URL or an ID for a Polis conversation or report,
fetches raw vote events and statements via the Polis API or CSV export, and
optionally constructs a participant × statement vote matrix in adata.X.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
The Polis source to load. Supported formats include:
The function will automatically parse the source to determine whether it refers to a conversation or report and fetch the appropriate data. |
required |
translate_to
|
str or None
|
Target language code (e.g., "en", "fr", "es") for translating statement text.
If provided, the original statement text in |
None
|
build_X
|
bool
|
If True, constructs a participant × statement vote matrix from the raw
votes using |
True
|
trim_rule
|
int or float or str
|
Controls how votes are trimmed by timestamp before building the vote
matrix. Passed directly to :func:
Only has effect when |
1.0
|
skip_cache
|
bool
|
If True, bypass the local file cache and always fetch fresh data from the network. Cached files expire automatically after 24 hours. |
False
|
show_progress
|
bool
|
If True, display a progress bar when fetching votes from the API (conversation URL/ID only). Uses tqdm, which auto-detects notebooks vs terminal. Has no effect when loading from a report URL or local directory. |
True
|
include_precomputed_groups
|
bool
|
If True, fetch the Polis math endpoint and store the precomputed
group cluster assignments produced by the Polis server in
|
False
|
Returns:
| Name | Type | Description |
|---|---|---|
adata |
AnnData
|
An AnnData object containing the loaded Polis data. |
DataFrame
|
|
|
dict
|
|
|
DataFrame
|
|
|
dict
|
|
|
dict
|
|
|
dict
|
|
|
ndarray
|
|
|
DataFrame
|
|
|
DataFrame
|
|
|
AnnData
|
|
|
Series
|
|
|
str
|
|
Notes
- If
build_X=False, onlyadata.unswill be populated, containing the raw votes and statements, and.X,.obs,.var, and.rawwill remain empty. adata.rawis assigned only after the first vote matrix build and is intended to be immutable.- If
translate_tois provided,adata.var["content"]is updated with translated text andadata.var["language_current"]is set to the target language. - The vote matrix is derived from the most recent votes per participant per statement, sorted by timestamp.
Examples:
Load data from a report or conversation ID or URL.
adata = val.datasets.polis.load("https://pol.is/report/r2dfw8eambusb8buvecjt")
adata = val.datasets.polis.load("6rphtwwfn4")
Load data from an alternative Polis instance via URL.
Load data from a HuggingFace dataset.
Load data from a path containing Polis CSV export files.
Source code in src/valency_anndata/datasets/polis.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 | |
valency_anndata.datasets.polis.export_csv ¶
Export an AnnData object to Polis CSV format (votes.csv + comments.csv).
Writes two of the five files from a full Polis data export:
votes.csv— vote event log (timestamp, datetime, comment-id, voter-id, vote)comments.csv— statement metadata (timestamp, datetime, comment-id, author-id, agrees, disagrees, moderated, comment-body)
The remaining three export files are not yet supported:
summary.csv, participant-votes.csv (vote matrix),
and comment-groups.csv.
Agrees/disagrees are computed from the vote matrix in adata.X.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
AnnData object produced by :func: |
required |
path
|
str
|
Directory to write the CSV files into. Created if it does not exist. |
required |
include_huggingface_metadata
|
bool
|
If True, write a |
False
|
Examples:
>>> adata = val.datasets.polis.load("5huyhtuvrm")
>>> val.datasets.polis.export_csv(adata, "./my_export")
Source code in src/valency_anndata/datasets/polis.py
865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 | |
valency_anndata.datasets.polis.translate_statements ¶
translate_statements(
adata: AnnData,
translate_to: Optional[str],
inplace: bool = True,
) -> Optional[list[str]]
Translate statements in adata.uns['statements']['comment-body'] into another language,
or copy originals if translate_to is None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
AnnData object containing |
required |
translate_to
|
Optional[str]
|
Target language code (e.g., "en", "fr", "es"). |
required |
inplace
|
bool
|
If True, updates |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
translated_texts |
list[str] | None
|
List of translated texts if |
Source code in src/valency_anndata/datasets/polis.py
-
Kept / total participants. Participants with fewer than 7 votes are excluded. ↩
-
Kept / total statements. Statements with fewer than 2 votes are excluded. ↩
-
Vote matrix completeness at each quartile of statements (25% / 50% / 75% / 100%), ordered by statement ID. Each value is the % of non-missing votes across all kept participants × the first N statements. Statement counts per quartile are shown in parentheses. ↩
























