This week, citing sources familiar with the matter, BuzzFeed News wrote that Facebook is developing an AI tool that will aggregate news articles so users don’t have to read them. The tool, code-named “TLDR” based on the acronym “too long, not read”, reportedly reduces articles to bullet points and offers comments and a virtual assistant to answer questions.
The media industry, which is in the midst of a historic decline in sales, did not respond nicely to the report. Facebook’s publisher relationship was strained even in the best of times, with the latter blaming the former for profiting from its work. Facebook launches like Instant Articles, which deprived recirculation and monetization opportunities, as well as algorithmic changes that prioritized content in favor of “meaningful interactions,” cemented that gap. Just this week, the New York Times reported that Facebook was planning to roll back a change to encourage reliable post-election coverage. This development follows reports that the company adjusted its newsfeed algorithm in 2017 to lower the visibility of “liberal” outlets like Mother Jones in order to ward off allegations of bias against conservative media.
Facebook has gone so far as to say that it would block local and international news exchanges about its products if laws requiring tech platforms to pay publishers for content became law, but tools like TLDR would eliminate the need for them. By rounding up news articles into bite-sized summaries that would likely live on Facebook, the likely outcome would be a further reduction in click-through rates for publishers. It is already estimated that around 43% of adults in the US get their messages from Facebook. When source visits incentives are no longer enabled with a tool like TLDR, that percentage increases.
Facebook could say that a round-up would lead to a more informed discussion on their platform, given that around 59% of links shared on social media have never been clicked. However, a tremendous amount of work shows that natural processing algorithms, such as those likely underlying TLDR, are prone to bias. Often, some of the training data for these algorithms comes from online communities with ubiquitous prejudices about gender, race and religion. AI research firm OpenAI notes that this can result in words like “naughty” or “sucked” being placed near feminine pronouns and “Islam” near words like “terrorism”. Other studies, such as one published by Intel, MIT, and Canada’s AI initiative CIFAR in April, have found high levels of stereotypical bias in some of the most popular models, including Google’s BERT and XLNet, OpenAIs GPT-2, and Facebook’s RoBERTa.
To be fair, some companies, including OpenAI, have had some success in the area of AI summary. In 2017, Salesforce researchers wrote an article describing a summary algorithm that learns from examples of high quality summaries. It uses a mechanism called “attention” to ensure that not too many repetitive strings of text are created. More recently, OpenAI managed to train a machine learning reward model to predict which summaries people would prefer from a Reddit dataset and tweak a language model to create summaries that score high according to the quality of the summaries of news articles as rated by a team of human reviewers.
However, a perfect summary of the text would require real intelligence, including a general knowledge and command of the language. And while algorithms like OpenAI’s GPT-3 push the envelope in this regard, they are far from arguing on a human scale. Researchers connected to Facebook and Tel Aviv University recently discovered that a pre-trained language model – GPT-2, the precursor to GPT-3 – failed to follow basic natural language instructions. In another example, scientists from Facebook and University College London found that 60 to 70% of responses from models tested on open source industry-standard benchmarks were embedded somewhere in the training sets, indicating that the Models just learned the answers by heart.
Aside from the fact that Facebook has a poor track record of AI in relation to offensive content, which doesn’t instill much trust in tools like TLDR. According to BuzzFeed, a resigning employee earlier this month estimated the company, even using third-party AI and moderators, “deleted less than 5% of all hate speech posted on Facebook.” (Facebook later pushed that claim back.)
It remains to be seen what TLDR will look like, how it will be deployed, and which publishers may ultimately be affected. However, the evidence points to a problematic and potentially ill-considered rollout. BuzzFeed recently quoted Facebook CTO Mike Schroepfer as saying that the company “needs to build [tools such as TLDR] responsible ”to“ earn trust ”and“ the right to keep growing ”. So far, Facebook has clearly not deserved that trust in the AI space and in other areas like advertising and acquisitions.
For AI coverage, send news tips to Khari Johnson, Kyle Wiggers, and Seth Colaner – and be sure to subscribe to the AI Weekly newsletter and bookmark our AI channel.
Thank you for reading,
AI Staff Writer
VentureBeat’s mission is to be a digital city square for tech decision makers to gain knowledge of transformative technology and transactions. Our website provides important information on data technologies and strategies to help you run your business. We invite you to become a member of our community and access:
- current information on the topics of interest to you
- our newsletters
- gated thought leader content and discounted access to our valuable events like Transform
- Network functions and more
become a member