Toloka

Toloka
Type of site	Platform
Founded	2014; 11 years ago
Owner	Nebius Group
Founder(s)	Olga Megorskaya
Industry	Artificial intelligence ; Information technology
URL	toloka.ai

Toloka is a crowdsourcing and generative AI services provider based in Amsterdam, the Netherlands.^[1] Olga Megorskaya is the current chief executive officer (CEO) of the company.^[1]^[2]

Toloka was founded in 2014, primarily for data markup to improve machine learning and search algorithms. As generative AI evolved, the platform adapted to provide expert data labeling to generational AI app producers.^[3] It helps development of artificial intelligence from training to evaluation and provides generative AI and large language model-related services.^[4]^[5]

History

Toloka was founded in 2014 by Yandex as a crowdsourcing and microtasking platform.^[6]

In 2024, Yandex N.V., the parent company of Toloka based in the Netherlands, was renamed as Nebius Group and since then Toloka in Russia has been renamed as Yandex Tasks, a separate entity from Toloka.^[5]^[7]

Services

Generative AI

In the generative AI domain, Toloka provides services such as model fine tuning, reinforcement learning from human feedback, evaluation, adhoc datasets, which require large volumes of highly skilled experts annotation.^[8]

Machine learning

On Toloka, trainers are tasked with identifying the presence or absence of objects in content, as specified by algorithms.^[6]^[9] They also assess chatbot responses within given dialogues for relevance and engagement.^[10] Additionally, translation verification tasks involve evaluating the accuracy of translations from multiple annotators. For the fine-tuning of large language models (LLMs), experts are required to generate and provide context-based prompts that can be single-turn or multi-turn, serving various domains and purposes.^[11]

Natural language processing

In the natural language processing (NLP) domain, Toloka facilitates text recognition and classification, sentiment analysis, named-entity recognition, and search relevance evaluation. It also provides transcription and classification of audio data.^[6]

Annotators

Toloka mainly works with domain experts, such as physicists, scientists, lawyers, and software engineers, to develop specialized data for models targeting niche tasks.^[1] Toloka also works with freelancers, referred to as "Tolokers," who annotate and create data for diverse applications.^[1] They perform tasks such as labeling personally identifiable information for AI projects, translating content, summarizing information, and transcribing audio to text.^[1]

Upon completion of each task the performer receives a reward based on the volume of images, videos, and unstructured text.^[6]

Research

In May 2019, Toloka's research team started publishing datasets for non-commercial and academic purposes to support the scientific community and attract researchers to Toloka. Such datasets are addressed to researchers in different directions like linguistics, computer vision, testing of result aggregation models, and chatbot training.^[12] Toloka research has been showcased at a range of conferences, including the Conference on Neural Information Processing Systems (NeurIPS),^[13] the International Conference on Machine Learning (ICML)^[14] and the International Conference on Very Large Data Bases (VLDB).^[15]

In February 2024, Toloka conducted a tutorial at the AAAI conference, focusing on aligning Large Language Models to Low-Resource Languages.^[16] Additionally, the company participated in BigCode, a joint scientific initiative led by HuggingFace and ServiceNow, where it served as the primary data partner.^[17]

References

^ ^1.0 ^1.1 ^1.2 ^1.3 ^1.4 Shrivastava, Rashi (July 24, 2024). "The Internet Isn't Big Enough To Train AI. One Fix? Fake Data". Forbes.
^ Sacolick, Isaac (April 8, 2024). "How to test large language models". InfoWorld.
^ Daria Baidakova (2021-09-29). "Data-Labeling Instructions: Gateway to Success in Crowdsourcing and Enduring Impact on AI". Data Science Central. Retrieved 2022-09-17.
^ "AI development from training to evaluation". Bloomberg.com. July 16, 2024.
^ ^5.0 ^5.1 Sawers, Paul (July 21, 2024). "From Yandex's ashes comes Nebius, a 'startup' with plans to be a European AI compute leader". TechCrunch.
^ ^6.0 ^6.1 ^6.2 ^6.3 Alex Woodie (2021-04-27). "Toloka Expands Data Labeling Service". Datanami. Retrieved 2022-09-17.
^ "Yandex founder to build AI business in Europe after Russia exit". Financial Times.
^ "Toloka.ai services".
^ Frederik Bussler (2021-12-07). "Data labeling will fuel the AI revolution". VentureBeat. Retrieved 2022-09-17.
^ Kumar Gandharv (2021-04-29). "Why Are Data Labelling Firms Eyeing Indian Market?". Analytics India Magazine. Retrieved 2022-09-17.
^ Koshelev, Sergey (October 24, 2023). "Diversity first: how we craft creative writing prompts for fine-tuning GenAI".
^ "Toloka to present new dataset at prestigious Data-Centric AI workshop launched by Andrew Ng". The AI Journal. Retrieved 2022-09-17.
^ "Toloka to present new dataset at prestigious Data-Centric AI workshop launched by Andrew Ng". FE News. 2021-11-18. Retrieved 2022-02-10.
^ "Toloka". icml.cc. Retrieved 2022-02-10.
^ "VLDB 2021 Challenge". crowdscience.ai. Retrieved 2022-02-10.
^ "The 38th Annual AAAI Conference on Artificial Intelligence".
^ "BigCode Governance Card". Arxiv.org.

External links

Official website

[Forbes-1] 1.0 ^1.1 ^1.2 ^1.3 ^1.4 Shrivastava, Rashi (July 24, 2024). "The Internet Isn't Big Enough To Train AI. One Fix? Fake Data". Forbes.

[2] Sacolick, Isaac (April 8, 2024). "How to test large language models". InfoWorld.

[3] Daria Baidakova (2021-09-29). "Data-Labeling Instructions: Gateway to Success in Crowdsourcing and Enduring Impact on AI". Data Science Central. Retrieved 2022-09-17.

[4] "AI development from training to evaluation". Bloomberg.com. July 16, 2024.

[TC2024-5] 5.0 ^5.1 Sawers, Paul (July 21, 2024). "From Yandex's ashes comes Nebius, a 'startup' with plans to be a European AI compute leader". TechCrunch.

[dtnm-6] 6.0 ^6.1 ^6.2 ^6.3 Alex Woodie (2021-04-27). "Toloka Expands Data Labeling Service". Datanami. Retrieved 2022-09-17.

[7] "Yandex founder to build AI business in Europe after Russia exit". Financial Times.

[8] "Toloka.ai services".

[9] Frederik Bussler (2021-12-07). "Data labeling will fuel the AI revolution". VentureBeat. Retrieved 2022-09-17.

[10] Kumar Gandharv (2021-04-29). "Why Are Data Labelling Firms Eyeing Indian Market?". Analytics India Magazine. Retrieved 2022-09-17.

[11] Koshelev, Sergey (October 24, 2023). "Diversity first: how we craft creative writing prompts for fine-tuning GenAI".

[12] "Toloka to present new dataset at prestigious Data-Centric AI workshop launched by Andrew Ng". The AI Journal. Retrieved 2022-09-17.

[13] "Toloka to present new dataset at prestigious Data-Centric AI workshop launched by Andrew Ng". FE News. 2021-11-18. Retrieved 2022-02-10.

[14] "Toloka". icml.cc. Retrieved 2022-02-10.

[15] "VLDB 2021 Challenge". crowdscience.ai. Retrieved 2022-02-10.

[16] "The 38th Annual AAAI Conference on Artificial Intelligence".

[17] "BigCode Governance Card". Arxiv.org.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

v t e Yandex
People	Ilya Segalovich (co-founder) Arkady Volozh (co-founder) Tigran Khudaverdyan
Products	Alice Cocaine (PaaS) Elliptics Kinopoisk MatrixNet Toloka Yandex.Auto Yandex Browser (Chromium) Yandex.Checkout Yandex Disk Yandex.Direct Yandex.Drive Yandex Eda Yandex Mail Yandex Maps Yandex Market Yandex.Money Yandex Launcher Yandex.Navigator Yandex News Yandex Search Yandex Taxi Yandex Translate Zen (recommendation system)
Category