Datasets
Reference Datasets¶
These datasets are provided as a starting point for exploration and experimentation.
valency_anndata.datasets.aufstehen ¶
Polis conversation of 33k+ Germans, run by political party Aufstehen.
This is largest Polis conversation run as of now, in fall 2018.
See: https://compdemocracy.org/Case-studies/2018-germany-aufstehen/
The data is pulled from an archive at: https://huggingface.co/datasets/patcon/polis-aufstehen-2018
Note
This dataset has been augmented by merging is-meta and is-seed statement
data (missing from the official CSV export) that were retreived from the
Polis API. Specifically, is-meta is required in order to reproduce outputs
of the Polis data pipeline.
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project. The data and more information about how the data was collected can be found at the following link: https://pol.is/report/r6xd526vyjyjrj9navxrj
Source code in src/valency_anndata/datasets/_load_aufstehen.py
valency_anndata.datasets.chile_protest ¶
Polis conversation of 2,700+ Chileans during the 2019 #ChileDesperto protests.
It was run informally by a single citizen, with minimal support infrastructure, outreach strategy, or moderation process.
See: https://en.wikipedia.org/wiki/Social_Outburst_(Chile)
Note
This dataset has been augmented by merging is-meta and is-seed statement
data (missing from the official CSV export) that were retreived from the
Polis API. Specifically, is-meta is required in order to reproduce outputs
of the Polis data pipeline.
Attribution
Data was gathered using the Polis software (see: https://compdemocracy.org/polis and https://github.com/compdemocracy/polis) and is sub-licensed under CC BY 4.0 with Attribution to The Computational Democracy Project. The data and more information about how the data was collected can be found at the following link: https://pol.is/report/r29kkytnipymd3exbynkd
Source code in src/valency_anndata/datasets/_load_chile_protest.py
Polis¶
valency_anndata.datasets.polis.load ¶
Load a Polis conversation or report into an AnnData object.
This function accepts either a URL or an ID for a Polis conversation or report,
fetches raw vote events and statements via the Polis API or CSV export, and
optionally constructs a participant × statement vote matrix in adata.X.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
The Polis source to load. Supported formats include:
The function will automatically parse the source to determine whether it refers to a conversation or report and fetch the appropriate data. |
required |
translate_to
|
str or None
|
Target language code (e.g., "en", "fr", "es") for translating statement text.
If provided, the original statement text in |
None
|
build_X
|
bool
|
If True, constructs a participant × statement vote matrix from the raw
votes using |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
adata |
AnnData
|
An AnnData object containing the loaded Polis data. |
DataFrame
|
|
|
dict
|
|
|
DataFrame
|
|
|
dict
|
|
|
dict
|
|
|
dict
|
|
|
ndarray
|
|
|
DataFrame
|
|
|
DataFrame
|
|
|
AnnData
|
|
Notes
- If
build_X=False, onlyadata.unswill be populated, containing the raw votes and statements, and.X,.obs,.var, and.rawwill remain empty. adata.rawis assigned only after the first vote matrix build and is intended to be immutable.- If
translate_tois provided,adata.var["content"]is updated with translated text andadata.var["language_current"]is set to the target language. - The vote matrix is derived from the most recent votes per participant per statement, sorted by timestamp.
Examples:
Load data from a report or conversation ID or URL.
adata = val.datasets.polis.load("https://pol.is/report/r2dfw8eambusb8buvecjt")
adata = val.datasets.polis.load("6rphtwwfn4")
Load data from an alternative Polis instance via URL.
Load data from a path containing Polis CSV export files.
Source code in src/valency_anndata/datasets/polis.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 | |
valency_anndata.datasets.polis.translate_statements ¶
translate_statements(
adata: AnnData,
translate_to: Optional[str],
inplace: bool = True,
) -> Optional[list[str]]
Translate statements in adata.uns['statements']['comment-body'] into another language,
or copy originals if translate_to is None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adata
|
AnnData
|
AnnData object containing |
required |
translate_to
|
Optional[str]
|
Target language code (e.g., "en", "fr", "es"). |
required |
inplace
|
bool
|
If True, updates |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
translated_texts |
list[str] | None
|
List of translated texts if |