Automated Content Quality Assurance for Crowdsourcing Educational Platforms
ACQUA (Automated Content Quality Assurance) system developed to provide a near real-time feed of text classifications over a dozen of labels
The success of social learning platforms strongly depends on the quality of contents created and maintained by the community. Being able to filter only the relevant and high-quality contents is crucial for guaranteeing the trust & safety of the users, enabling the personalization of their learning experience, as well as avoiding gamification side effects.
In this talk, we will analyze the Brainly Community Q&A platform that allows more than 350M students worldwide to ask questions and seek help in their homework and studying.
We will dig deeper into the ACQUA (Automated Content Quality Assurance) system that was developed to provide a near real-time feed of text classifications over a dozen of labels such as toxicity, spam, gibberish, incomplete questions, non-educational questions, personal identifiers, grammar mistakes, presence of irrelevant details, readability, foreign language, wrong subjects.
The blacklist feed is then used for content reporting and moderation as well as for other downstream tasks such as recommender systems, information retrieval, users reputation models, or various data analytics scopes.
The talk will cover some modeling and engineering aspects of building such NLP systems and is accessible without any prior knowledge in the field.
In particular, we will cover the definition of content quality from an educational point of view, the challenges of correctly classifying those contents using machine learning, a few tips on how to leverage the state-of-the-art NLP, and some MLOps data-centric aspects related to data labeling and human-in-the-loop workflows.