Aggregating functions for DOCS
¶
Aggregating file to base DOCS
file¶
- Documents.aggregate(filename: str, *, left_on: None | str | callable = None, right_on: None | str | callable = None, on: str | None = None, subset: None | str | List[str] = None, drop: None | str | List[str] = None, rename: dict | None = None, merge_policy: Literal['merge', 'keep', 'erase', 'replace', 'fill_na'] | None = 'merge', preprocessing: None | Callable | Operation = None, postprocessing: None | Callable | Operation = None, read_method: Callable | None = None, kw_read: dict | None = {})¶
Aggregation of a table from an Excel/CSV file.
The
left_on
andright_on
arguments are column names used to perform a join:left_on
is a column in the central fileeffectif.xlsx
, andright_on
is a column from the file to aggregate. For more complex joins, you can use theguv.helpers.id_slug()
andguv.helpers.concat()
functions (see Examples). Ifleft_on
andright_on
have the same value, you can simply specifyon
.subset
is a list of columns to keep if you don’t want to aggregate all columns.drop
is a list of columns to remove.rename
is a dictionary of columns to rename.read_method
is a callable used to read the file and is called withkw_read
.preprocessing
andpostprocessing
are callables that take a DataFrame as input and return a processed one — they are applied before and after the aggregation, respectively.merge_policy
indicates how to handle merging columns with the same name ineffectif.xlsx
and the aggregated file.- Parameters:
filename (
str
) – Path to the CSV/Excel file to aggregate.left_on (
str
) – Column name in theeffectif.xlsx
file used for the join. You can also use functions likeguv.helpers.id_slug()
andguv.helpers.concat()
for multi-column joins.right_on (
str
) – Column name in the file to aggregate used for the join. Functions likeguv.helpers.id_slug()
andguv.helpers.concat()
are also supported.on (
str
) – Shortcut whenleft_on
andright_on
are the same.subset (
list
, optional) – List of columns to include. By default, all columns are incorporated.drop (
list
, optional) – List of columns to exclude from aggregation.rename (
dict
, optional) – Dictionary to rename columns after incorporation.read_method (
callable
, optional) – Function used to load the file. Pandas functions likepd.read_csv
andpd.read_excel
are automatically selected based on file extension (“.csv”, “.xlsx”).kw_read (
dict
, optional) –Keyword arguments passed to
read_method
. For example, for a “.csv” file:kw_read={"header": None, "names": ["Email", "PW group"]} kw_read={"na_values": "-"}
preprocessing (
callable
, optional) – Pre-processing function applied to the DataFrame before incorporation.postprocessing (
callable
, optional) – Post-processing function applied to the DataFrame after incorporation.merge_policy (
str
, optional) –Strategy for merging columns with the same name:
merge
: Merge only if columns complement each other (NA values).replace
: Use all non-NA values from the file to aggregate.fill_na
: Only replace NA values ineffectif.xlsx
.keep
: Keep the original column fromeffectif.xlsx
without changes.erase
: Overwrite the column with values from the file to aggregate.
Examples
Aggregating columns from a CSV file, matching the
email
column in the CSV withEmail address
in the central file:DOCS.aggregate( "documents/notes.csv", left_on="Email address", right_on="email" )
Aggregating only the
Note
column:DOCS.aggregate( "documents/notes.csv", left_on="Email address", right_on="email" subset="Note" )
Aggregating and renaming the
Note
column toNote_médian
:DOCS.aggregate( "documents/notes.csv", left_on="Email address", right_on="email" subset="Note", rename={"Note": "Note_médian"} )
Aggregating using a CSV without a header:
DOCS.aggregate( "documents/notes.csv", on="Email", kw_read={"header": None, "names": ["Email", "Grade"]}, )
Aggregating based on
Name
andLast name
using a slug ID to allow for flexible matching (ignores accents, cases, hyphens, etc.):from guv.helpers import id_slug DOCS.aggregate( "documents/notes.csv", left_on=id_slug("Name", "Last name"), right_on=id_slug("Name", "Last name") )
Aggregating when the aggregated file has a single column with both names, while the main file separates them:
from guv.helpers import concat DOCS.aggregate( "documents/notes.csv", left_on=concat("Name", "Last name"), right_on="Full_name" )
- Documents.aggregate_jury(filename: str)¶
Aggregates the result of a jury from the task
XlsGradeBookJury
.The equivalent using
aggregate()
is written as:DOCS.aggregate( "generated/jury_gradebook.xlsx", on="Email", subset=["Aggregated grade", "ECTS grade"] )
- Parameters:
filename (
str
) – The path to the file to aggregate.
Examples
DOCS.aggregate_jury("generated/jury_gradebook.xlsx")
- Documents.aggregate_moodle_grades(filename: str, rename: dict | None = None)¶
Aggregates grade sheets exported from Moodle.
The grade sheet must be an export of one or more grades from the Moodle gradebook in the form of an Excel or CSV file.
Unused columns will be removed. Column renaming can be performed by specifying
rename
.- Parameters:
filename (
str
) – The path to the file to aggregate.rename (
dict
, optional) – Dictionary to rename columns after incorporation.
Examples
DOCS.aggregate_moodle_grades("documents/SY02 Notes.xlsx")
- Documents.aggregate_moodle_groups(filename: str, colname: str, backup: bool | None = False)¶
Aggregates group data from the “Group Choice” activity.
Since the group column is always named “Groupe”, the
colname
argument allows you to specify a different name. If the column already exists in the central file, it can be backed up using thebackup
option.- Parameters:
filename (
str
) – The path to the file to aggregate.colname (
str
) – The column name to store group information in.backup (
bool
) – Save the column before making any changes
Examples
DOCS.aggregate_moodle_groups("documents/Paper study groups.xlsx", "Paper")
- Documents.aggregate_org(filename: str, colname: str, on: str | None = None, postprocessing: None | Callable | Operation = None)¶
Aggregation of a file in Org format.
The document to aggregate is in Org format. The headings serve as keys for aggregation, and the content under those headings is aggregated.
- Parameters:
filename (
str
) – Path to the Org file to aggregate.colname (
str
) – Name of the column in which to store the information from the Org file.on (
str
, optional) – Column from theeffectif.xlsx
file used as the key to aggregate with the headings from the Org file. By default, the headings must contain the students’ first and last names.postprocessing (
callable
, optional) – Post-processing to apply to the DataFrame after integrating the Org file.
Examples
Aggregating a file where the headings are student names:
The Org file:
* Bob Morane Frequently absent * Untel See excuse email
The aggregation command:
DOCS.aggregate_org("documents/infos.org", colname="Information")
Aggregating a file where the headings match elements from an existing column. For example, you can aggregate project grades from an Org file, grouped by project team. Specify the column containing the project groups, e.g., “Project1_group”.
The Org file:
* Project1_Group1 A * Project1_Group2 B
The aggregation command:
DOCS.aggregate_org("documents/infos.org", colname="Project 1 Grade", on="Project1_group")
- Documents.aggregate_self(*columns)¶
Aggregation of the central file
effectif.xlsx
itself.It is sometimes more convenient to manually add or modify columns directly in the central file
effectif.xlsx
rather than programmatically usingDOCS.aggregate(...)
, for example. Since guv cannot reliably detect which columns were manually added or modified, you must specify them explicitly using the*columns
argument.- Parameters:
*columns (any number of
str
) – The manually added or modified columns to retain when updating the central file.
Directly modify base DOCS
file¶
- Documents.add(filename, func=None)¶
Adds a file to the central file
Simply adds the file if the central file does not yet exist. Otherwise, uses the function
func
to perform the aggregation. The functionfunc
takes as arguments the existing DataFrame and the path to the file, and returns the updated DataFrame.See specialized functions for incorporating standard documents:
aggregate()
: CSV/Excel documentaggregate_org()
: Org document
- Parameters:
filename (
str
) – The path to the file to aggregate.func (
callable
, optional) – A function with signature DataFrame, filename: str -> DataFrame that performs the aggregation.
Examples
Adding data when there is nothing yet:
DOCS.add("documents/base_listing.xlsx)
Aggregating using a function:
def function_that_incorporates(df, file_path): # Incorporate the file at `file_path` into the DataFrame `df` # and return the updated DataFrame. DOCS.add("documents/notes.csv", func=function_that_incorporates)
- Documents.apply_cell(name_or_email: str, colname: str, value, msg: str | None = None)¶
Replaces the value of a cell.
name_or_email
is the student’s full name or email address, andcolname
is the name of the column where the change should be made. The new value is provided viavalue
.- Parameters:
name_or_email (
str
) – The student’s full name or email address.colname (
str
) – The name of the column where the modification should be made.value – The value to assign.
msg (
str
, optional) – A message describing the operation.
Examples
DOCS.apply_cell("Mark Watney", "DIY Grade", 20)
- Documents.apply_column(colname: str, func: Callable, msg: str | None = None)¶
Modifies an existing column using a function.
colname
is the name of an existing column, andfunc
is a function that takes an element from the column as input and returns a modified value.A
msg
can be provided to describe the operation; it will be displayed when the aggregation is performed. Otherwise, a generic message will be shown.- Parameters:
colname (
str
) – Name of the column where replacements will be made.func (
callable
) – Function that takes an element and returns the modified element.msg (
str
, optional) – A message describing the operation.
Examples
DOCS.apply("note", lambda e: float(str(e).replace(",", ".")))
- Documents.apply_df(func: Callable, msg: str | None = None)¶
Modifies the central file using a function.
func
is a function that takes a DataFrame representing the central file and returns the modified DataFrame.A
msg
can be provided to describe what the function does. It will be displayed when the aggregation is performed. Otherwise, a generic message will be shown.- Parameters:
func (
callable
) – Function that takes a DataFrame and returns a modified DataFrame.msg (
str
, optional) – A message describing the operation.
Examples
Add a student missing from the official listing:
import pandas as pd df_one = ( pd.DataFrame( { "Last name": ["NICHOLS"], "Name": ["Juliette"], "Email": ["juliette.nichols@silo18.fr"], } ), ) DOCS.apply_df(lambda df: pd.concat((df, df_one)))
Remove duplicate students:
DOCS.apply_df( lambda df: df.loc[~df["Email"].duplicated(), :], msg="Remove duplicate students" )
- Documents.compute_new_column(*cols: str, func: Callable, colname: str, msg: str | None = None)¶
Creating a column from other columns
The columns required for the calculation are specified in
cols
. In case you want to change the column used for the calculation without modifying thefunc
function, you can provide a tuple("col", "other_col")
wherecol
is the name used infunc
andother_col
is the actual column used.The
func
function that computes the new column receives a Pandas Series of all the values from the specified columns.- Parameters:
*cols (list of
str
) – List of columns provided to thefunc
functionfunc (
callable
) – Function that takes a dictionary of “column names/values” as input and returns a computed valuecolname (
str
) – Name of the column to createmsg (
str
, optional) – A message describing the operation
Examples
Weighted average of two grades:
def average(grades): return .4 * grades["Note_médian"] + .6 * grades["Note_final"] DOCS.compute_new_column("Note_médian", "Note_final", func=average, colname="Note_moyenne")
Average ignoring undefined values:
def average(grades): return grades.mean() DOCS.compute_new_column("note1", "note2", "note3", func=average, colname="Note_moyenne")
Recalculation with a modified grade without redefining the
average
function:DOCS.compute_new_column( ("note1", "note1_fix"), "note2", "note3", func=average, colname="Note_moyenne (fix)" )
- Documents.fillna_column(colname: str, *, na_value: str | None = None, group_column: str | None = None)¶
Replaces undefined values in the
colname
columnOnly one of the options
na_value
andgroup_column
should be specified. Ifna_value
is specified, values are unconditionally replaced with the provided value. Ifgroup_column
is specified, values are filled by grouping ongroup_column
and using the only valid value in that column per group.- Parameters:
colname (
str
) – Name of the column whereNA
values will be replacedna_value (
str
, optional) – Value to replace undefined values withgroup_column (
str
, optional) – Name of the column used for grouping
Examples
Replace undefined entries in the
note
column with “ABS”:DOCS.fillna_column("note", na_value="ABS")
Replace undefined entries within each group identified by the
groupe_projet
column with the only defined value in that group:DOCS.fillna_column("note_projet", group_column="groupe_projet")
- Documents.flag(filename_or_string: str, *, colname: str, flags: List[str] | None = ['Oui', ''])¶
Flag a list of students in a new column
The document to aggregate is a list of student names displayed line by line.
- Parameters:
filename_or_string (
str
) – Path to the file to aggregate, or the file content as a string.colname (
str
) – Name of the column in which to place the flag.flags (
str
, optional) – The two flag values used, default is “Yes” and empty.
Examples
Aggregating a file with student names as header:
The file “tiers_temps.txt”:
# Comments Bob Morane # Robust to circular permutation and case aRcTor BoB
The aggregation instruction:
DOCS.flag("documents/tiers_temps.txt", colname="Tiers-temps")
- Documents.replace_column(colname: str, rep_dict: dict, new_colname: str | None = None, backup: bool | None = False, msg: str | None = None)¶
Replacements in a column
Replaces values specified in
rep_dict
in the columncolname
.If the
backup
argument is provided, the column is saved before any modification (with a_orig
suffix). If thenew_colname
argument is provided, the column is copied to a new column namednew_colname
and changes are made on this new column.A message
msg
can be specified to describe what the function does; it will be displayed when the aggregation is performed. Otherwise, a generic message will be shown.- Parameters:
colname (
str
) – Name of the column in which to perform replacementsrep_dict (
dict
) – Dictionary of replacementsnew_colname (
str
) – Name of the new columnbackup (
bool
) – Save the column before making any changesmsg (
str
, optional) – A message describing the operation
Examples
DOCS.replace_column("group", {"TD 1": "TD1", "TD 2": "TD2"})
ECTS_TO_NUM = { "A": 5, "B": 4, "C": 3, "D": 2, "E": 1, "F": 0 } DOCS.replace_column("Note_TP", ECTS_TO_NUM, backup=True)
- Documents.replace_regex(colname: str, *reps, new_colname: str | None = None, backup: bool | None = False, msg: str | None = None)¶
Regex replacements in a column
Replaces, in the
colname
column, all occurrences matching the regular expressions specified inreps
.If the
backup
argument is provided, the column is saved before any modification (with a_orig
suffix). If thenew_colname
argument is provided, the column is copied to a new column namednew_colname
and the modifications are applied to that new column.A
msg
message can be specified to describe what the function does; it will be displayed when the aggregation is performed. Otherwise, a generic message will be shown.- Parameters:
colname (
str
) – Name of the column in which to perform replacements*reps (any number of
tuple
) – Pairs of regex / replacementnew_colname (
str
) – Name of the new columnbackup (
bool
) – Save the column before making any changesmsg (
str
, optional) – A message describing the operation
Examples
DOCS.replace_regex("group", (r"group([0-9])", r"G\1"), (r"g([0-9])", r"G\1"))
- Documents.switch(filename_or_string: str, *, colname: str, backup: bool = False, new_colname: str | None = None)¶
Performs value swaps in a column
The
colname
argument specifies the column in which to perform the swaps. If thebackup
argument is provided, the column is saved before any modification (with a_orig
suffix). If thenew_colname
argument is provided, the column is copied to a new column namednew_colname
and the modifications are applied to this new column.- Parameters:
filename_or_string (
str
) – Path to the file to aggregate, or the content of the file as a string.colname (
str
) – Name of the column where the changes will be appliedbackup (
bool
) – Save the column before making any changesnew_colname (
str
) – Name of the new column
Examples
DOCS.switch("fichier_échange_TP", colname="TP")
DOCS.switch("Dupont --- Dupond", colname="TP")