Natural Language Processing for Humanitarian Survey Work Challenge Statement

Challenge presented by International Rescue Committee (IRC)

The problem

  • Better understand the needs, desires, and opinions of the affected population (especially beneficiaries) by extending regular surveys to include remote data collection and processing of qualitative responses to open-ended questions.
  • Maintain communication with the affected population, including beneficiaries and non- or potential beneficiaries, about services provided, decisions taken.


  • Understanding urgent population needs is crucial both during the first days of an emergency and on a continuous basis to monitor changes. Information needs vary geographically (single neighborhood vs. countrywide), by domain (e.g. multisectoral vs. food security), and by source (overall population vs. beneficiaries).
  • Needs assessments (with the overall affected population) and specific surveys with beneficiaries are restricted due to physical boundaries (security or logistics).
  • There is a need to maintain communication with the affected population about the services provided (and not provided) for the sake of transparency and maintaining good relationships with the community.
  • Quality: Current primary data collection from interviews with individuals are costly, forcing us to primarily use closed-ended questions with limited answer options (e.g. yes/no, scales) over qualitative questions with long open-ended responses. Recording, transcribing, and analyzing in-depth responses from affected people is not feasible with current resources. We are also time-bound in order to keep the data and analysis relevant to ever-changing contexts in humanitarian crises.
  • Quantity: Collecting data from people about the services they are using is currently limited to exit surveys, satisfaction boxes, hotlines and focus group discussions (FGD), which don’t reach everyone and can be very resource intensive. As a result, feedback is insufficient and too slow to improve programming that delivers the most relevant services. “Closing the loop” on feedback, in terms of providing a response/answer/resolution to an inquiry or problem, has been a challenge.
  • Many people in emergencies tend to prefer in-person interactions. So, how can we leverage new technologies without completely losing a personal touch when trying to get and respond to feedback with beneficiaries? And when trying to provide information about services provided (and not provided, maybe). How do we use tech in communities that have not engaged with technology much before, don’t own phones, or live in areas with no/limited cell network?
  • Few humanitarian emergencies are in English-speaking countries. How can automated transcription and analysis of voice responses be done in non-Western languages?
  • How can the analysis be automated to the extent to avoid common biases, avoid ‘missing’ some important themes/trends, and reduce manual labor?

What are some possible solutions to this challenge, and what are their limitations?

  • Computer-assisted phone interviews are able to transcend physical boundaries as well as lower data collection costs, but do not solve the lack of qualitative understanding
  • Surveys could be automated (though interactive voice response): Rather than asking a series of closed-ended questions, respondents would answer a few open-ended questions. Recordings could be transcribed automatically. But reading and analyzing a large volume of such responses might still be too time consuming. There is no off-the-shelf system that transcribes and assists with the analysis of qualitative data.
  • Instead of only requesting information, should the same call also allow the provision of information (e.g. People can dial a number and then go through a menu: “for a list of health services, press 1”.)
  • Maintaining contact via some form of tech/phone technology (above) with community leaders, with the expectation/assumption that information will be shared face-to-face to the relevant members of the community. This requires an understanding of the community structure, as well as which groups (by sex, age, ethnicity, etc) may be on the periphery of community structures.
  • Limitations for most, if not all, tech: Not everyone owns a phone, not all locations have network coverage. In some contexts, this technology will not be possible to roll out. In other contexts, non-phone owners may not be able to access it. Can tech be used to connect with communities/individuals who have not, traditionally, been exposed to much technology use?

What has been done so far?

  • Low tech: Satisfaction boxes, hotlines, FGDs, exit surveys, recording complaints/feedback ad hoc (as it is heard/given), face-to-face household surveys/KIIs/FGDS for needs assessments, engaging with community leaders and the community during monthly meetings

Could you list one or more user stories for the most important cases that should be solved by this solution?

  • As a project coordinator/team lead, I want to know how the community perceives our presence so that we can determine how to 1) improve our relationship with the community, 2) increase access and use of our services 3) mitigate any issues that could impact my team’s safety and security.
  • As project coordinator/team lead, I want to know what actual users/beneficiaries think of our services so that we can continue what is working and course correct things that are not working.
  • As a project coordinator/manager/officer, I want to be able to communicate with the community (as a group and individuals) to explain how we are responding to their input, address concerns, and answer questions about our services.
  • As a project coordinator/manager/team lead, I want the community to know what services we provide, to whom and where so that people know how to access these services and that the services exists.
  • I am an affected person who has been excluded from services by my community leader, I want someone from IRC to contact me confidentially so I can share my concern.
  • I am an affected person who’s needs have changed since the beginning of the response, I want to be asked by IRC what my needs are now, and understand what they do based on my resposne.
  • I am an affected person who feels surveys that make me answer questions about my situation with tick box answers really don’t let me share what I need, or what is happening with me. I have ideas and concerns, and I want to share them in a confidential way, but I am not going to take the initiative to approach the IRC with them.

Who are the key players (so far or should be engaged in future)?

  • Members of the affected population
  • Community leaders
  • Humanitarian actors
  • Relevant tech providers/actors

What are the possibilities you see for better leveraging emerging data science & AI capabilities?

  • Natural language processing would be needed to automate transcription and analysis (as well as translation where necessary)
  • Sentiment analysis
  • Emerging data science could help with improving the processing of large amounts of data before and after analysis, e.g. helping the field analyst take full advantage of all the trends identified, help with data cleaning by spotting outliers or fake data, etc.