Csv
read_organism_trends(file, pdf_id_column='Key', columns=None, remove_nan=True)
Read organism trends from a CSV file. There are multiple trends per pdf ID, so they are grouped into lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file
|
str
|
Path to the CSV file. |
required |
pdf_id_column
|
str
|
Name of the column containing the pdf IDs. |
'Key'
|
remove_nan
|
bool
|
Whether to remove NaN values from the dictionaries. |
True
|
columns
|
list[str] | None
|
Optional list of columns to read from the CSV file. If not provided, all columns are read. |
None
|
Returns: A dictionary mapping pdf IDs to their organism trends each represented as a list of dictionaries.
Source code in src/kibad_llm/dataset/csv.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
remove_nan_from_dict(d)
Remove keys with NaN values from a dictionary.
Source code in src/kibad_llm/dataset/csv.py
4 5 6 | |