Pakistan, an AI Superpower?

Singapore, in the 1960s, was a small, British trading port populated by a mix of people and languages. Many were poorly educated such as the Malaysians, Chinese and Indians who lived on that tiny island. Only a minority elite were educated, such as the British colonizers and the Singaporean-born Lee Kuan Yew, who became the island’s first prime minister. This visionary saw an opening that catapulted little Singapore into joining the world’s financial centers.

In fact, as he wrote in his book “From Third World to First”, Singapore became a first world nation in one generation as a result. In the international financial market, he found a financial gap. When the financial markets in Zurich, Switzerland closed for the night, Frankfurt took over. When Frankfurt closed for the night, London took over. And when London closed for the night, Wall Street in New York took over. And when New York closed for the night, San Francisco took over. And when San Francisco closed for the night, there was a gap until Zurich opened the next morning.

In order to close this gap, Lee Kuan Yew pitched Singapore as the bridge to complete a 24 hour round- the-world financial circle. The resulting influx of foreign capital transformed this minuscule island-state into becoming one of the four, fast-growing Asian tigers. Similarly, to transform Pakistan into a developed nation, Imran Khan’s administration has an opportunity to bridge a gap.

No. Not a financial gap since that opportunity is gone. Rather, Pakistan has an excellent opportunity to become the Saudi Arabia of data annotation. International Data Corporation (IDC) forecasts that expenditure on Artificial Intelligence (AI) will grow from US$12 B in 2017 to US$58 B by 2021. There are many profitable roads for a country to travel on to be a major player in the AI domain. And yes, if Pakistan gets into AI engineering, that alone is a multi-billion dollar software opportunity. However, we’re talking about a much more modest, billion-dollar opportunity, that even low-skilled people without a high school degree can do: data annotation.

What is data annotation?

Data annotation refers to the process of labeling data that may be in the form of text, audio, images or even video. Today, the bulk of annotation – whether for driverless cars or voice assistants such as Google Home  – is done by humans. This way, AI applications, that are trained on pre-labeled data, are able to recognize similar patterns in new data.

The analogy between an AI or machine learning ecosystem and a food ecosystem is similar. Before you can eat a home-cooked meal of chappati and chicken tikka masala, a chef has to cook the raw ingredients. But before cooking, he has to wash, cut, clean and prep the ingredients. But before prepping, he has to buy it. But before buying, a farmer has to grow the wheat or raise the chickens: yes, Imran Khan’s chicken and egg vision makes a great analogy for describing the AI opportunity!

AI workflows are similar: data generation – collection – wrangling – annotation – transformation – training – all result in the final ‘cooked’ product where you have an AI model that you can use. Fortune 500 companies such as Google, Amazon and Microsoft easily spend hundreds of millions of dollars annually on annotations off-shored to the Philippines, India, China, Romania and other low-cost countries. Even on a smaller scale, there are annotation start-ups like Scale AI, which earn $100mn annually. And this is just service labor which does not include the additional billion in building software annotation tools to make the annotation work faster.

Why data annotation?

Anyone who has a maid to prepare food knows that this is a low skilled, low-wage activity. Similarly, a data annotator is a low skilled, low-paid worker who averages US$300 a month. On a higher level, a low-skilled English student who is moderately qualified could earn US$500 as an annotator today on websites such as Amazon’s mturk.com or figure8.com. And if you get into medical annotation, monthly salaries may be around US$800. So why should we, as a country, travel on the data annotation road to becoming an AI superpower instead of taking the AI engineering or even the AI research route?

Simply because this path has a low barrier of entry for millions of Pakistanis who don’t even have a high-school degree and is even easier for English-speaking college grads. It’s true that the US$1000 per month for the role of AI engineer is more attractive. But data annotations are easy for our low-skilled workforce: they literally are doing the same, tedious task day after day.

For certain online data annotation tasks, English competency requirements are minimal (although many require good English skills). In addition, some annotation tasks almost don’t require English and only require annotators to identify and mark photos: is this a bee, for example? Is it an automobile? Is this chicken tikka masalah or a kebab? This requires skills which even teenagers can learn in a few hours.

But jumping straight to the next level in the AI space, AI engineering, requires a few hundred hours of learning and work experience for college graduates. Still very doable in a few months but now the entry barrier is much higher. And trying to get into AI research requires Master’s and PhD level expertise or thousands of hours (i.e., years) of earning and work experience.

Job creation potential

According to Gartner, by 2020, AI will become a positive net job motivator by creating 2.3 M jobs. For Pakistan, this means employment for the large pool of cheap, low-skilled workers which represent a national resource like oil. Attracting data annotation work to the country through our cheap resources creates a snowball effect which raises demand in the country for more data engineers, data architects, AI engineers, functional data scientists and physical infrastructure. In other words, data annotation is the first mile on journey that will naturally take the country up the AI food-chain into developing our own AI engineers and yes, even AI scientists.

One example could be annotations around natural language processing: Amazon’s voice assistant, Alexa, knows just seven languages with Urdu being missing and thus needing language annotation. The use cases for data are endless and are limited only by your imagination and business model for monetization. It may be around annotations of images and videos such as for Waymo, Google’s self-driving division, to annotate cars, pedestrians, street signs or for farming or for medical images.

Individuals with basic English skills can jump straight into annotation by signing up at Elance, mTurk or Figure 8. Businesses can go through those existing websites by connecting secondary school graduates to annotation jobs. Government could fuel this journey by borrowing a page from Singapore’s growth playbook: introduce favorable annotation policies, enable high speed Internet access, give grants, create tax shelter programs and incentives to welcome international data annotation businesses to leverage Pakistan’s ‘oil’ i.e., low-cost data annotators.

Regardless of the route you follow – individual, business or state – Pakistan certainly has a wide-open road that can lead it into becoming a dominant AI player. But the road to avoid is the one which leads Pakistan away from becoming an AI nation.