Following the increased volume of child maltreatment cases during the Covid-19 pandemic, and in light of limited resources, a Special Committee seeks input from various stakeholders on the design of an Automated Decision System (ADS) using machine learning for identifying children at risk of maltreatment. The screening system will focus on calls placed to a hotline, where information of child maltreatment is disclosed to social-worker interns.
Based on the call and on additional information logged in the system or pulled from government and public databases, the system will automatically decide which calls to write off as not requiring any additional action, and which calls to flag as requiring an additional investigation to be performed by a certified social worker.
In our data, 30% of the children are at risk
In a perfect world, only calls regarding children at risk would be
But models (and also humans) aren't perfect.
The model might make a mistake and not flag a child who is at risk
Or the opposite: the model might flag a child as one at risk
One approach would be to have the model flag children at risk “aggressively”, such that close calls would be decided as “at risk”, and so the model will rarely miss a child at risk.
We can evaluate this model by counting the number of children who are correctly and incorrectly flagged as children at risk:
On the other hand, certified social workers would be allocated to cases where no further investigation is required, resulting in wasting public resources and also in unnecessarily burdening families of children who are not truly at risk.
These issues and trade-offs in model optimization aren't new, but they're brought into focus when we have the ability to fine-tune exactly how aggressively the model works when flagging children as being at risk or not.
Sometimes it is useful to gather additional data to improve the predictive power of the model.
But do all data sources contribute equally? What are the meanings and consequences in terms of rights and values?
Now it is your turn to decide (1) which data sources to use and which to exclude; and (2) how aggressively the model is when flagging a call about a child at risk.
Adopted from AI Explorable by PAIR, made by Adam Pearce, licensed Apache License 2.0.
Silhouettes from ProPublica's Wee People.