Getting Into Google News: What You Need to Know

Robin Fry

January 17, 2020

What Is Google News, and How Can I Get My Content In It?

Google News was created in 2002 to help Google manage and deliver the news stories that appear in its search results, and to make them easier for its users to find and digest. Since then, in typical Google fashion, it has become one of the largest news aggregators online, serving content from over 75,000 sources in 62 languages across the world.

Google News results appear at the top of search results, above the organic search listings for a query, as well as in their own dedicated News tab in Google. As a result, the articles and publications that appear in Google News are far more prominent in search, and far more likely to get clicked on, than those that aren’t.

If you have a large amount of content on your website that can reasonably be classified as ‘news’, and this content is being produced on a frequent basis, then you should consider applying to be included in Google News results. Be warned though: Google News has a list of stringent requirements that a website must meet in order to be included. If you want to get into Google News, first read the guidelines below to make sure your content is eligible and to give yourself the best possible chance of qualifying.

Update: On 28th May 2020, Google announced that, as part of their new focus on page experience, they will also be updating the eligibility criteria for Top Stories in mobile search results. According to the statement:

“AMP will no longer be necessary for stories to be featured in Top Stories on mobile; it will be open to any page. Alongside this change, page experience will become a ranking factor in Top Stories, in addition to the many factors assessed. As before, pages must meet the Google News content policies to be eligible.”

While these changes aren’t due to be rolled out until 2021, this offers even more incentive for websites to ensure they meet Google News guidelines, in order to ensure their content is displayed in this highly competitive area of Google search results.

Make Sure Your Pages Can Be Found

This one might seem obvious – why would you waste time creating web content only for no one to even find it? But mistakes, technical issues or just plain old bad SEO can stop your pages from appearing in search results, and Google isn’t going to direct anyone to inaccessible content.

Link to every page

You should have already linked to your content from other pages of your site to make sure Google’s bots can crawl to it and add it to its index. But it pays to do a thorough check. Are there orphan pages on your site that aren’t linked to from anywhere else? Maybe they contain out-of-date content that you could just delete or archive. From an SEO perspective, I’d always advise refreshing old pages rather than just getting rid of them. That said, if the content no longer serves any purpose then it’s better to delete it and permanently (301) redirect the URL to a more valuable page. That way, you provide a more satisfying user journey and retain some of the benefit of any links to the old content.

You should also make sure all links to internal pages are HTML links, as these can be crawled easily by Googlebot-News (the robot specifically assigned to crawl News pages). Using JavaScript to link to pages may make those links inaccessible to the crawler. If those are the only links to those pages on your website, then you’ve basically locked Google out, even if those pages are perfectly accessible to your users – which they may not be if they have JavaScript disabled on their browser.

Check whether pages are indexed

You can do a ‘site:’ search in Google to see if it has indexed a particular page. Just type site:www.website.com/mypage (replacing the domain and page path with the real one you’re looking for) into Google and it will show you any matching result(s) it has in its index. If the URL is pretty new, it may take a few days or weeks to appear in the index. You can always use the ‘Fetch as Google’ function in Google Search Console to give Google a bit of a nudge, but it usually indexes new content within that timeframe. If it doesn’t, or you’re looking for a page that’s been around for a while but isn’t showing up as indexed, there may be a problem.

Don’t block valuable content

There are a few reasons why Google might not index a page. The first thing to check is whether you’re actually blocking the page from being crawled or indexed by Google. Check the robots.txt file in the root folder of your website. Is there a Disallow command relating to the URL you’re looking at? This will be telling Google that you don’t want it to crawl that page (it won’t always obey this command, but you shouldn’t block a page from being crawled if you want people to find it in search engines). Look out for the wildcard ‘*’ character; this could be ‘accidentally’ telling Google not to visit a group of pages that you actually want it to find.

I’ve seen some big-brand clients use just these two lines in robots.txt files before:

User-agent: *
Disallow: /

That basically asks all robots (thanks to the wildcard character in the ‘User-agent’ line) to not visit any pages of your site (as the forward-slash in the ‘Disallow’ line refers to everything in the root folder of the domain). Most of them will oblige, meaning you’ve effectively banished yourself from search results altogether.

Robots.txt files are very useful, but you should only use them to try to prevent areas that you really don’t want to be publicly available from being crawled. Basically, any files that are private or of little value to a visitor, or that you don’t want crawlers to waste time trawling through (using up ‘crawl budget’ that would be better spent on more important pages). For example, you might want to stop Google and other search engines from crawling through the image folder on your website, if there’s nothing in there that’s particularly useful for search or you just don’t want to be found. It’s best to keep those types of files or pages in separate folders and just block those folders entirely, rather than risk blocking the wrong things.

Meta robots tags can also be added to the code of individual pages to tell search engines not to index them. Sometimes these are included while the page is being built, so that it can be tested as if it were ‘live’, without appearing in search results before it’s ready. If they aren’t removed when the page goes live then it will remain invisible to search engine users.

Check the <head> section of the HTML code of your web page. If you find any of the following tags, then they are likely to be the cause of a page not getting into Google’s index:

  • <meta name=”robots” content=”noindex”> – asks search engines not to index a page
  • <meta name=”robots” content=”nofollow”> – asks robots not to crawl a page
  • <meta name=”robots” content=”noindex, nofollow” /> – asks for a page not to be crawled or indexed

As with the robots.txt file, robots and search engines can choose to ignore these tags, but in most cases they will respect your wishes and leave the pages out of search results. Make sure you don’t have a Meta robots tag blocking any important pages of content.

Google News will not include any content that is blocked to robots. If you’re blocking multiple pages of editorial content, and there doesn’t appear to be a valid reason for it, your website is unlikely to qualify for inclusion.

Avoid duplicate content

Google also might not index a page if it has already indexed one just like it. After all, why should it waste server space on something it has already stored a copy of? Be careful not to duplicate content on your site. IfThe Front Page (1931 film) poster two pages have exactly the same content, one of them might not be indexed. Even if they are both indexed, they’ll then be competing with each other for the same searches, and holding each other back.

If you have multiple pages with the exact same content on, ask yourself why. If there’s no good reason for it, then check which, if any, are indexed (using a ‘site:’ search as described above). If there’s just one version that Google has indexed, 301 redirect all the other, duplicate, URLs to that page. If there are multiple – or no – versions of that page indexed, then keep the oldest version of that page and redirect the others to that.

If there is a legitimate reason for having two accessible URLs with the same content that you can’t get around (a requirement to have the same page in two different category directories, for example), then you can fix a duplicate content issue with canonical tags. These tell search engines to view one specific URL as the ‘default’ version of a page and to treat all other versions with a canonical tag as if they had that URL.

Canonical tags should be applied in a similar way to the redirect recommendation above – give all identical pages a canonical tag to the URL of the oldest, or the only indexed, version of those pages. The tag should be placed in the <head> section of a page’s HTML code and should look like this (using your own default URL in place of the example one):

<link rel=”canonical” href=”https://www.website.com/mypage/” />

One last, very, important piece of advice on the subject of duplicate content: Never, ever plagiarise your content from someone else’s site. Not only is it terribly bad form, as well as a probable copyright infringement, but if Google finds that you’ve just copied your content from elsewhere on the web then it will take a very dim view of the quality of all of our content. Even if you escape retribution from the original author, you’re very likely to get punished by Google by performing poorly in search results. Plagiarised content will definitely not qualify for inclusion in Google News, so you’ll be onto a loser there too.

Write good content

Another, hopefully less likely, reason that Google still hasn’t indexed content within a few weeks is that the content itself just isn’t up to scratch. If you’ve ruled out any of the more technical blockers above, take a look at the page and ask yourself:

  • Does it have a low word count? Less than 200 words is bad; I recommend at least 500 words per page
  • Are there spelling or grammar errors? Google wants to serve the best quality content to its users, so it will be pedantic. Make sure all content is properly proofread, preferably before it gets published to the site
  • Is the information correct? Again, Google wants to give its users accurate information, and is becoming more vigilant on this after taking flak for featuring ‘fake news’ results in the last few years. Make sure you’re not giving false information, and cite your sources wherever necessary – this is particularly important if you want to qualify as a Google News site
  • Is it useful? If Google can’t see the point of your content, if it’s boring, or if it’s just not really telling anyone anything, it probably won’t bother indexing it

I would hope that your content is well-written, authoritative and has something valuable to say, especially if you think it’s worthy of being called ‘news’. But if it’s guilty of any of the crimes listed above, it’s time to have a rethink.

Google News has extremely high standards for the content it includes, so even if your content has been indexed, it’s essential to make sure it passes muster when it comes to quality.

Have Permanent Category URLs

Your top-level News pages should stay where they are. If the URLs of your main News landing page or the category pages change on a regular basis, this will cause problems when Googlebot-News tries to crawl your content. It’s also bad for SEO, so I recommend you don’t do it at all, on any part of your site, if you can help it.

You can update the record of your section URLs in the Google News Publisher Center but it will make things much easier for them, and you, if your main URLs don’t ever change. Don’t include date or time references in your main URLs (it’s not such a big deal for time-specific articles), or any other elements that need to be changed frequently or change automatically based on circumstances. This will help Google News find your new content and pages quickly and easily.

 

Use UTF-8 Encoding

Google News prefers web pages to be Unicode (UTF-8) encoded. This is pretty standard for web content so you’re probably doing it already, but check your site or ask your web developer to be absolutely certain.

You should also make sure your character encoding is declared in the HTML. Again, this is standard practice so you may well be covered already. You should look for the following tag, immediately after the opening <head> tag of the page code:

<meta charset=”utf-8” />

If you can’t see it, and you’ve confirmed that the site is definitely UTF-8 encoded, then make sure it’s added to every page.

Be Mobile-Friendly

There are a million reasons why your site should be mobile-friendly, and this is just one of them. Google News won’t include content that gives a poor experience for mobile users, no matter how worthy that content might otherwise be.

Ideally (for the million reasons mentioned above) you should have a responsive website, which automatically tailors the appearance of your web pages to whatever device a visitor is using. Some platforms and CMS automatically incorporate this functionality, but if your site is more than a few years old or was bespoke-built by a web developer, it’s likely to need some work done.

If your site is not responsive, then it either needs to be designed with mobile users in mind, or you need to create a mobile-specific version of your website that visitors are automatically redirected to if they’re using a mobile device. Unless you’ve already got these set up – and, to be frank, even if you have – having a responsive website is the easiest and best option.

If in doubt about whether or not your website is mobile-friendly, you can use Google’s own Mobile-Friendly Test at https://search.google.com/test/mobile-friendly to find out. Google judges each page individually, so it’s worth running the test on several URLs, especially if there are areas of the site that are designed or coded differently to the rest.

A mobile-friendly site will perform much better in search than one that isn’t, which is number one of those million reasons for making sure your site is accessible to mobile users as soon as possible.

 

Keep Pages on the Same Domain

When you submit a site for inclusion on Google News, you are submitting your web domain. It can’t crawl or host articles on an entirely different domain and attribute them to the same source. For example, if your web address is www.website.com, you can’t ask for articles on www.anotherwebsite.com to also be included in the same request.

Google News will include articles on subdomains, e.g. news.website.com, or subdirectories, e.g. www.website.com/news/, as these are still on the same domain.

If you have multiple language versions of your site, each version will have to be submitted for inclusion separately, even if they are on the same domain. Google News will not include a site that presents content in multiple languages on the same page.

Also, Google News will only follow redirects to URLs on the registered domain. If you set up redirects for any of your News pages, point them to pages within the same domain, preferably within the News section of your site. Otherwise they will no longer be included.

 

Content Must be Unique and Newsworthy

Land on the MoonThis really should go without saying, but it would be wrong of me to not mention it. As well as the duplicate content issues mentioned above, along with plagiarism and copyright concerns, Google News is invested in giving its users fresh and unique content. If it’s seen all your content elsewhere, it won’t include it, and rightly so. And if it isn’t news, it simply won’t qualify.

The standard for acceptable content in Google News is high. After all, you’re asking to be part of a service which already includes major national and international news outlets. Your News content needs to be, well, news. If it’s nothing but copies of stories that have already been reported, or opinion pieces on those stories, it won’t cut the mustard. If your content is not topical, no matter how well-written it might be, it won’t qualify to be included. Everyone loves How-To guides, for example, but they aren’t news and won’t be accepted.

Your news content needs to have actual content, in the form of text. Pages that only contain images and/or videos won’t qualify.

Job updates are totally against Google News’ Terms of Service; not only will they stop you from qualifying for inclusion, they could get your site removed if it has been included already. By all means, have a Careers page and link to it from your navigation or footer menu, but don’t submit your job pages to Google News.

Submitting articles that are blatantly just promotional or marketing material for your organisation, or anyone else’s, is also likely to get you kicked out. Don’t do it.

It’s ok to talk about events that other sites have already covered, but you need to provide your own take on those events, and you need to be producing this type of content on a very regular basis. In other words, you and your co-writers have to be journalists. The clue’s in the name. Google News is a place to catch up on the latest news. For everything else, there’s plain old Google.

Don’t Mess Around With Your Articles

Google News is a sensitive little flower that doesn’t like change. Once your articles are posted, it’s best to leave them as they are. Of course, the occasional correction or editorial update will be necessary, but you should really avoid making frequent changes to your posts, particularly in the first two days of publication, as this will confuse Googlebot-News and could cause indexing issues.

If you decide to redesign your site, you’ll need to be really careful with your news pages. Changes to the layout or coding of a page can also cause crawling and indexing problems. Try to make only superficial changes to your news pages if at all possible.

Consider Including Speakable Markup

As it won’t affect your eligibility, this is optional, but highly recommended. The speakable schema property allows you to identify sections of your articles that can be read out by a Google Assistant enabled device such as Google Home. Google Assistant will use speakable structured data to answer voice search queries on such devices. So if someone asks their Google Home, “What’s the latest news about …” or a similar question about a topic relevant to your marked-up article, it may well read out your selected piece of content.

The speakable schema is quite a recent addition by Google and is very much in Beta at the time of writing, so it’s certain to evolve quite rapidly. At the moment it’s available for any site that’s eligible for Google News (which you should be by now if you’ve been paying attention) to users in the United States that have Google Home devices set to English and to sites that publish content in English. Google are looking to roll this out to other countries and languages as soon as speakable has been implemented by a “sufficient number” of publishers in a particular country or language, so it’s worth implementing now even if it’s not yet available for your audience.

To set up speakable markup, you’ll need to add schema to each article page. If you’re doing this already on your site for other elements – and again, I strongly recommend  that you do for a multitude of reasons that would take a whole other guide – then you should find it pretty straightforward to add some more. If not, then take some time to learn about schema by visiting the extremely helpful schema.org website. You’ll find it’s pretty easy to implement on most sites, and valuable for any of the multitude of reasons I previously mentioned.

The speakable property can be used in the Article or Webpage schema type (again, see schema.org for more information). The required properties are:

  • @type, which should be set to SpeakableSpecification
  • Either cssSelector, such as a class attribute like headline or summary

OR

  • xPaths, assuming there is an XML view of the content, e.g. /html/head/title

 

 

This is an example of how the schema would look in the code of a page using xPaths and implemented with JSON-LD:

<html>

<head>

<title>Publisher Qualifies for Google News</title>

<meta name=”description” content=”The world of Google News was rocked today by the announcement that a publisher became eligible for inclusion in Google News after following the steps laid out in a really well-written guide. The publisher is said to be pleased.” />

<script type=”application/ld+json”>
{

“@context”: “http://schema.org/”,

“@type”: “WebPage”,

“name”: “Publisher Makes the Headlines”,

“speakable”:

{

“@type”: “SpeakableSpecification”,

“xpath”: [

“/html/head/title”,

“/html/head/meta[@name=’description’]/@content”

]

},

“url”: “http://www.fantasticnews.com/publisher-qualifies-google-news”

}

</script>

</head>

</html>

If all of the above has left you scratching your head, feel free to skip this step. But it really is worth talking to your web developer(s) about implementing schema on your website at some stage in the near future.

 

How to Apply to Google News

So are you happy your news pages pass all the criteria above? Sure? Then it’s time to submit your website to the Google News Publisher Center.

You’ll need to have an accessible Google Account to do this. You should already have one connected with your website if you’re using Google Search Console, Analytics or Tag Manager. If so, you should use that, as it’s a good idea to keep all your accounts for these tools in one place. If not, you can sign in with an existing Gmail account or create one using any email address.

If the site has been verified in Google Search Console, it will already be listed in the Publisher Center, under ‘My Sites’, in which case you just need to click the ‘Request inclusion in News Index’ button next to the website address on the list. Otherwise, you’ll need to verify your domain. Enter the details of your website, then you’ll be asked to download a verification file and upload it to your site. You’ll need FTP access to your server, or the help of a web developer who does, to do this. If that’s not easy, there are alternative verification methods available. You can add a Meta tag to the HTML of your home page, or sign in through your domain name provider, Google Analytics or Google Tag Manager to verify that you’re the rightful owner of the domain.

Once you’re verified and have requested inclusion, a form should pop up. You need to fill this in with the following information:

  • A brief description of your news site (required)
  • Your name (required)
  • Your email address (required)
  • The website URL (required)
  • The website name (required)
  • The language the content is written in (required)
  • The city your website is based in (required)
  • The State or Province your website is based in
  • The Region your website is based in (required)
  • The category (or categories) that apply to your site
  • The URL of your main news page (required)
  • The category label that best describes your news section (required)

The description of your news site should be short, factual and just provide an explanation of what your news section covers. The website name should just be the name that appears on the site; don’t add any extraneous information. By ‘Region’, Google actually mean country, which you can pick from a drop-down list (or you can pick ‘World’ if you’re a global publication).

The categories that can be applied to your site are:

  • Opinion content
  • User-generated content
  • Blog
  • Press Release
  • Satire

If none of these apply, don’t worry, you don’t have to tick a box. But you should tick any and all boxes that do apply, or you may not qualify for submission.

Category labels are separate and different from the categories above. These appear as a drop down list and you do have to pick one (there is an ‘Other’ option if you feel none of them apply). The category label describes the main focus of the subject matter of your news pages, e.g. Business, Entertainment, Politics, etc.  Google News will use this to group your website with similar sites and to make sure your content is being offered to the most appropriate audience.

If you have multiple news sections on your site, you can add more sections to the form. You can apply a different category label for each section.

You also need to tick the box at the bottom of the form to certify that the country you selected is the legal domicile of your publication and that you are an authorised representative of the publisher.

Once you’ve filled everything in, submit your application! Google News should get back to you within three weeks (usually it’s within a week) to let you know whether your site has met their guidelines and been approved for inclusion.

What Happens If You’re Not Accepted?

As you may have already gathered, the criteria for appearing in Google news are incredibly strict. You can be rejected for a whole host of reasons. I’ve tried to cover as many of them as I can in this guide, but ultimately it’s at Google’s discretion and you won’t be given an explanation as to why you’ve been rejected.  If you’re positive you’ve done everything you can to meet all of Google News’ requirements, it might just be the case that, after assessing your content, it wasn’t deemed newsworthy enough or it covered similar ground to existing sites in their index.

All is not lost, however. After 60 days, you can reapply for submission to Google News. You can use that time to check and double-check everything to make sure you’ve got the best chance of qualifying next time round. Take a long, critical look at the content you submitted last time. Is it definitely newsworthy? Are you giving people information or insight they couldn’t find elsewhere just as easily? Even if one or two pages you included for submission aren’t quite right, this could have tipped the balance and rendered all of your content ineligible. Be stricter about what you submit next time.

Google News is a very exclusive club, and it can be really snooty about who it lets in. If your content is rejected multiple times, focus your efforts elsewhere. Make sure your content is optimised for organic search results, where membership is less select. Think about other areas where you can promote our news content, like social media. There are plenty of opportunities to build a huge audience, if you’re creating good content.

Generate a News Sitemap

Once you’ve been accepted for inclusion in Google News, I highly recommend creating and submitting a News sitemap. This is entirely optional, but it will help Googlebot-News to crawl your articles quickly and easily, which in turn enables Google News to index all your content as quickly as possible. You probably already have an XML sitemap for your website – if you don’t, I also highly recommend you get one made as soon as you can. The News sitemap is a version of that which includes just your News pages. It has some additional tags that should be applied to each page:

  • <publication> – this has two child tags:
    • <name> – the name of the publication where the article appears. This must exactly match your publication name on Google News.
    • <language> – the language of the publication. This must use a valid ISO 639 language code, for example en for English.
  • <publication_date> – the date the article was first published. This must use the valid W3C format and include at least the complete date (YYYY-MM-DD), with hours, minutes, seconds and decimal fractions of a second optional.
  • <title> – the title as it appears on the page. Don’t be tempted to add any extra information like the author name or a subheader.

All of the above tags are required elements and must be included for every URL.

The sitemap should be created in the same format as a standard XML sitemap. There are free XML sitemap generators available online, however, I strongly advise against using them for this job, as they will almost definitely add in the non-news pages of your site. Your News sitemap should contain only the URLs of the news articles on your site and nothing else. It also needs to be updated frequently, so it’s worth taking some time to think about how you’ll be tackling this. You could set up and update it manually, but that’s likely to require a heck of a lot of time and effort. Using paid-for sitemap generator software, that can create a specific News sitemap, may be an easier option.

If your site is built in WordPress, then Yoast have created a plugin that will generate a News sitemap for you, as well as helping you with other aspects of optimising your news content. Other platforms and Content Management Systems may offer similar tools. Screaming Frog allows you to submit a list of the URLs you want to include to generate a sitemap (although you will still need to add the News-specific sitemap tags and elements for each page). The Unlimited Sitemap Generator available from XML-Sitemaps.com is inexpensive and has a News sitemap option.

These are Google News’ requirements for a News sitemap:

  • Update the sitemap with new article URLs as they’re published. If you’ve qualified for Google News then you’re probably producing a large amount of articles on a regular basis, in which case having this done automatically is definitely the healthy option.
  • Include URLs of all articles published in the last 2 days. Once they’re older than that, it’s fine to remove them – all articles will remain in the Google News index for 30 days before dropping out.
  • Maximum 1,000 URLs per sitemap. If you want to include more, you’ll need to create multiple sitemaps and list them in a sitemap index file using the XML format. Google says you can list up to 50,000 individual sitemaps in the index, giving you the potential to produce up to 50 million brand new articles every couple of days. Think you’re up to that challenge?
  • Don’t make a new sitemap with each update. At least, not until you’ve reached capacity on the current one. Simply add your new article URLs to the existing sitemap.
  • Only include news articles. No other pages of your site should be included.

Once you have created your News sitemap, it should be saved in the root directory of your website as news-sitemap.xml. Then it should be submitted to Google through the Search Console profile for your website. Again, if you don’t have a profile set up already, I strongly recommend you make one right away, as it’s an incredibly useful resource.

You can submit your sitemap by selecting Crawl > Sitemaps in the left-hand menu, and then clicking on the red ‘Add/Test Sitemap’ button in the upper right corner. Once it’s uploaded, you’ll be able to see in Search Console whether it has found any issues, and if so, what they are. This will also make Google aware of the sitemap, if it wasn’t already, which should encourage it to check for updates on a regular basis.

Stick to the Rules

Even once you’ve been accepted to Google News, your content will continue to be judged by all the strict criteria I’ve mentioned above. Any rule breaking can get your site removed pretty swiftly, and you’ll need to wait 60 days before you can reapply.

Keep referring back to this guide, especially if you’re planning a change in focus or strategy or you’re making significant changes to your website. Submit a News sitemap to Search Console and check it regularly for any issues. Stick to the facts, and make sure your content remains newsworthy and of a high standard, and you’ll continue making headlines in Google News for a long time.