As India embraces AI as an instrument for economic growth and social progress, it is necessary to examine the foundations and practices upon which such promises of AI are expected to materialize. This study examines the data practices entailed in building AI systems for healthcare and agriculture in India.


With agriculture being the only sector to show positive growth in India in 2020 and the Indian healthcare sector rising to the challenge of a global pandemic, these two sectors demonstrated a surprising resiliency over the last couple of years. It makes sense then that these are priority sectors for India’s developing national technology policy, and faster-growing tech industry. Opportunity is rife - and it’s taking shape in the form of Artificial Intelligence (AI) systems that could impact millions.

AI systems are fast becoming the catch-all solution to innovation across many sectors, specifically health and agriculture. These systems are designed to enable machines to perform (often tiresome and complex) human-like tasks, and adapt, learn and continue to work based on new inputs. They’re meant to make things like eye screenings, patient data registries, soil health readings, and farmer production data more comprehensive, accessible, and easy. Simple enough when the AI systems are trained using good quality data and data practices.

Predictably, that just isn’t the case. To build these systems, the data and the processes to collect, label and make decisions based on that data (ie. data practices) don’t happen in a vacuum. Data practices take on their own life, influenced by certain norms and world-views largely affected by who’s involved (or not) at each stage. The data acquired and produced could be gold or could be garbage - determining whether the resulting AI systems will do the same.

The landscape of data and data practices in agriculture and healthcare is chaotic, fragmented and inconsistent. Often private tech players have to collect data from scratch because existing data just isn’t usable. The entry of these new players has meant there are more intermediaries at each stage, resulting in a greater possibility of error, and too few contextual stakeholders involved, enabling private players to do things their way in the name of profit. Partnerships formed - private & public, private & stakeholder - are also couched in opaque agreements, with no avenue for public scrutiny or an equal exchange of knowledge. And with all this focus on data and how it's processed, the humans behind the AI get lost in the mix too. There is no real indication of how frontline medical workers and farmers are compensated or benefited for their work and knowledge. A lot of social questions around privacy, fairness and transparency and technical questions around accuracy, usability and relevancy still remain unanswered on purpose.

Focused on the data collection and annotation stages (processing), this latest report from the Digital Futures Lab examines and analyses the practices involved in building AI systems for healthcare and agriculture in India. The Digital Futures Lab recognises that if this is the direction India’s economic and social progress is headed, stronger foundations to mitigate harm need to be built now, to prevent band-aid solutions later. It is part of a larger attempt to develop and contribute to a proactive, bottom-up agenda for Responsible AI in India, starting with what makes AI tick: data, and the people behind it.

This report identifies 6 actionable pathways toward building responsible data practices in these sectors, and for Responsible AI in India in general:


You can also read a graphic short, based on this report, that highlights the challenges faced by different, integral workers in the data life-cycles specific to healthcare and agriculture in India ⬇

Towards Responsible Data Practices.pdf