Microsoft's new MS MARCO dataset may help AI systems talk like humans

Written By DNA Web Team | Updated: Dec 18, 2016, 10:34 AM IST

Microsoft Logo.

Microsoft's new dataset is called Microsoft MAchine Reading COmprehension.

Microsoft has released a set of 100,000 questions and answers which will allow scientists to create artificial intelligence (AI) systems that can answer queries as well as a human.

The dataset, called Microsoft MAchine Reading COmprehension (MS MARCO), is based on anonymised real-world data.

By making it broadly available to researchers, the team is hoping to spur the kind of breakthroughs in machine reading that are already happening in image and speech recognition.

They also hope to facilitate the kind of advances that may lead to the long-term goal of 'artificial general intelligence,' or machines that can think like humans.

"In order to move towards artificial general intelligence, we need to take a step towards being able to read a document and understand it as well as a person," said Rangan Majumder, a partner group programme manager with Microsoft's Bing search engine division who is leading the effort.

Right now, systems to answer sophisticated questions are still in their infancy. Some can answer basic questions, like 'What day does Hanukkah start?' or 'What's 2,000 times 43?', said Majumder.

However, in many cases, search engines and virtual assistants will instead point the user to a set of search engine results.

Users can still get the information they need, but it requires culling through the results and finding the answer on the web page.

In order to make automated question-and-answer systems better, researchers need a strong source of what is called training data.

These datasets can be used to teach AI systems to recognise questions and formulate answers and eventually create systems that can come up with their own answers based on unique questions they have not seen before.