Analyzing “How Google Search Works” Changes from Google

      Google has made some new substantial changes to their “How Google Search Works” search documents for website owners. And as always when Google makes changes to important documents with impact on SEO, such as How Search Works and the Quality Rater Guidelines, there are some key insights SEOs can gleam from the new changes Google has made.Google對網站所有者的“ Google搜索的工作方式”搜索文檔進行了一些新的重大更改。與往常一樣,當Google更改對SEO有影響的重要文檔時,例如Search How Works和Quality Rater Guidelines,SEO可以從Google所做的新更改中汲取一些重要見解。

      ?

      Of particular note, Google detailing how it views a “document” as potentially comprising of more than one webpage, what Google considers primary and secondary crawls, as well as an update to their reference of “more than 200 ranking factors” which has been present in this document since 2013.特別值得注意的是,Google詳細介紹了其如何將“文檔”視為可能包含多個網頁,Google認為主要和次要抓取內容,以及它們對“超過200個排名因素”的引用的更新 自2013年以來在本文檔中。

      But here are the changes and what they mean for SEOs.但是,這里有變化及其對SEO的意義。


      Contents

      l?1?Crawling

      l?1.1?Improving Your Crawling

      l?2?The Long Version

      l?3?Crawling

      l?3.1?How does Google find a page?

      l?3.2?Improving Your Crawling

      l?4?Indexing

      l?4.1?Improving your Indexing

      l?4.1.1?What is a document?

      l?5?Serving Results

      l?6?Final Thoughts

      l?6.0.1?Jennifer Slegg

      l?6.0.2?Latest posts by Jennifer Slegg (see all)

      -------------------------------------------------------------

      Crawling

      Google has greatly expanded this section.谷歌已經大大擴展了本節。

      They made a slight change to wording, with “some pages are known because Google has already crawled them before” changed to “some pages are known because Google has already visited them before.”? ?This is a fairly minor change, primarily because Google decided to include an expanded section detailing what crawling actually is. 他們對措辭做了些微更改,將``某些頁面已知,因為Google之前已經對其進行了爬網''更改為``某些頁面已知,因為Google之前已經對其進行了訪問''。這是一個相當小的更改,主要是因為Google決定包括一個擴展部分,詳細說明實際的爬網。


      Google removed:


      This process of discovery is called crawling. 發現的過程稱為爬網。


      The removal of the crawling definition was simply because it was redundant. ?In Google’s expanded crawling section, they included a much more detailed definition and description of crawling instead. 刪除爬網定義僅僅是因為它是多余的。在Google的展開的抓取部分中,它們包含了更詳細的抓取定義和說明。


      The added definition:


      Once Google discovers a page URL, it visits, or crawls, the page to find out what’s on it. Google renders the page and analyzes both the text and non-text content and overall visual layout to decide where it should appear in Search results. The better that Google can understand your site, the better we can match it to people who are looking for your content. Google發現網頁網址后,便會訪問或爬網該網頁以查找其中的內容。Google渲染頁面并分析文本和非文本內容以及整體視覺布局,以決定頁面應出現在搜索結果中的位置。Google越了解您的網站,我們就越能將其與正在尋找您內容的人匹配。


      There is still a great debate on how much page layout is taken into account. ?There was the page layout algo that was released many years, in order to penalize content that was pushed well below the fold in order to increase the odds a visitor might click on an advertisement that appeared above the fold instead. ?But with more traffic moving to mobile, and the addition of mobile first indexing, the importance of above and below the fold for on page layout seemingly was less important. 關于要考慮多少頁面布局仍存在很大爭議。有一種發布了多年的頁面布局算法,目的是對被遠遠低于首屏的內容進行處罰,以增加訪問者點擊首屏上出現的廣告的幾率。但是,隨著越來越多的流量轉移到移動設備上,并增加了移動設備首次索引功能,頁面布局上下折疊的重要性似乎已變得不那么重要了。


      When it comes to page layout and mobile first, Google says:在頁面布局和移動優先方面,Google表示:


      Don’t let ads harm your mobile page ranking. Follow the?Better Ads Standard?when displaying ads on mobile devices. For example, ads at the top of the page can take up too much room on a mobile device, which is a bad user experience. 不要讓廣告損害您的移動頁面排名。在移動設備上展示廣告時,請遵循“更好的廣告標準”。例如,頁面頂部的廣告可能會在移動設備上占用過多的空間,這是糟糕的用戶體驗。


      But in How Google Search Works, Google is specifically calling attention to the “overall visual layout” with “where it should appear in Search results.”?但是在“ Google搜索的工作方式”中,Google特別呼吁人們注意“整體視覺布局”,即“其應出現在搜索結果中的位置”。


      It also brings attention to “non-text” content. ?While the most obvious of this refers to image content, the referral to it is quite open ended. ?Could this refer to OCR as well, which we know Google has been dabbling in? 它還引起對“非文本”內容的關注。雖然最明顯的是圖像內容,但對它的引用是開放式的。難道這也指OCR,我們知道Google一直在涉足?


      Improving Your Crawling


      Under the “to improve your site crawling” section, Google has expanded this section significantly as well. 在“改善您的網站抓取”部分下,Google也在此部分進行了顯著擴展。


      Google has added this point:


      Verify that Google can reach the pages on your site, and that they look correct. Google accesses the web as an anonymous user (a user with no passwords or information). Google should also be able to see all the images and other elements of the page to be able to understand it correctly. You can do a quick check by typing your page URL in the Mobile-Friendly test tool. 確認Google可以訪問您網站上的頁面,并且看起來正確。Google以匿名用戶(沒有密碼或信息的用戶)訪問網絡。Google還應該能夠查看頁面中的所有圖像和其他元素,以便能夠正確理解它。您可以通過在適用于移動設備的測試工具中輸入頁面網址來進行快速檢查。


      This is a good point – so many new site owners end up accidentally blocking Googlebot from crawling or not realizing their site is set to be only viewable by logged in users only. ?This makes it clear that site owners should try viewing their site without also being logged into it, to see if there are any unexpected accessibility or other issues that aren’t note when logged in as an admin or high level user. 這是一個好點-許多新的網站所有者最終無意間阻止了Googlebot抓取,或者沒有意識到自己的網站被設置為只能由登錄用戶查看。這清楚表明,網站所有者應嘗試在未登錄的情況下查看其網站,以查看是否有意外的可訪問性或其他以管理員或高級用戶身份登錄時未注意到的問題。


      Also recommending site owners check their site via the Mobile-Friendly testing tool is good, since even seasoned SEOs use the tool to quickly see if there are Googlebot specific issues with how Google is able to see, render and crawl a specific webpage – or a competitor’s page. 此外,建議網站所有者通過移動設備友好的測試工具檢查其網站是否良好,因為即使是經驗豐富的SEO也會使用該工具來快速查看Googlebot在查看,呈現和抓取特定網頁方面是否存在特定問題,或者 競爭對手的頁面。


      Google expanded their specific note about submitting a single page to the index. Google擴展了有關將單個頁面提交到索引的特定注釋。


      If you’ve created or updated a single page, you can submit an individual?URL to Google. To tell Google about many new or updated pages at once, use a sitemap. 如果您創建或更新了一個頁面,則可以向Google提交一個單獨的URL。要一次將許多新頁面或更新頁面告訴Google,請使用Sitemap。


      Previously, it just mentioned submitting changes to a single page using the submit URL tool. ?This just adds clarification to those who are newer to SEO that they do not need to submit every single new or updated pages to Google individually, but that using sitemaps is the best way to do that. ?There have definitely been new site owners who add each page to Google using that tool because they don’t realize sitemaps is a thing. ?But part of this is that WordPress is such a prevalent way to create a new website, yet it does not have native support for sitemaps (yet), so site owners need to either install a specific sitemaps plugin or use one of the many SEO tool plugins that offer sitemaps as a feature. 以前,它只是提到使用“提交URL”工具將更改提交到單個頁面。這只是向那些剛接觸SEO的人增加了澄清,即他們不需要分別向Google提交每個新的或更新的頁面,但是使用站點地圖是最好的方法。肯定有新的網站所有者會使用該工具將每個頁面添加到Google,因為他們沒有意識到站點地圖是一回事。但這部分是因為WordPress是創建新網站的一種普遍方式,但是它還沒有對站點地圖的本機支持(因此),因此站點所有者需要安裝特定的站點地圖插件或使用眾多SEO工具之一 提供站點地圖功能的插件。


      This new change also highlights using the tool for creating pages as well, instead of just the previous reference of “changes to a single page.”?這項新更改還強調了使用該工具來創建頁面,而不僅僅是以前提到的“更改到單個頁面”。


      Google has also made a change to the section about “if you ask Google to crawl only one page” section as well. ?They are now referencing what Google views as a “small site” – according to Google, ?a smaller site is one with less than 1,000 pages. Google還對“如果您要求Google僅抓取一頁”這一部分進行了更改。他們現在引用的是Google所說的“小型網站”-根據Google的說法,較小的網站就是少于1000頁的網站。


      Google also stresses the importance of a strong navigation structure, even for sites it considers “small.”? It says site owners of small sites can just submit their homepage to Google, “provided that Google can reach all your other pages by following a path of links that start from your homepage.” Google還強調了強大的導航結構的重要性,即使對于它認為“很小”的網站也是如此。它說,小型站點的站點所有者只需將其主頁提交給Google,“前提是Google可以通過遵循從您的主頁開始的鏈接路徑來訪問您的所有其他頁面。”


      With so many sites being on WordPress, it is less likely that there will be random orphaned pages that are not accessible by following links from the homepage ?But depending on the specific WordPress theme used, sometimes there can be orphaned pages from pages being added but not manually added to the pages menu… in these cases, if a sitemap is used as well, those pages shouldn’t be missed even if not directly linked from the homepage. WordPress上的站點如此之多,不太可能出現隨機的孤立頁面,這些頁面無法通過跟隨主頁上的鏈接進行訪問,但是根據所使用的特定WordPress主題,有時可以添加頁面中的孤立頁面,但不能 手動添加到頁面菜單中…在這種情況下,如果還使用了站點地圖,則即使不直接從首頁鏈接也不會錯過這些頁面。


      In the “get your page linked to by another page” section, Google has added that links in “advertisements links that you pay for in other sites, links in comments, or other links that don’t follow the Google Webmaster Guidelines won’t be followed by Google.”? A small change, but Google is making it clear that it is a Google specific thing that these links won’t be followed, but they might be followed by other search engines. Google在“讓您的頁面鏈接到另一個頁面”部分中,添加了“您在其他網站上付費的廣告鏈接”,“評論中的鏈接”或不遵循“ Google網站站長指南”的其他鏈接中的鏈接 被Google跟蹤。” 這是一個很小的變化,但是Google明確表示Google明確規定這些鏈接不會被跟蹤,但是其他搜索引擎可能會跟蹤它們。


      But perhaps the most telling part of this is at the end of the crawling section, Google adds:?但Google補充說,其中最有說服力的部分是在爬網部分的末尾:


      Google doesn’t accept payment to crawl a site more frequently, or rank it higher. If anyone tells you otherwise, they’re wrong. Google不接受付款來更頻繁地抓取網站或對其排名更高。如果有人告訴你,他們錯了。


      It has long been an issue with scammy SEO companies to guarantee first positioning on Google, to increase rankings or requiring payment to submit a site to Google. ?And with the ambiguous Google Partner badge for AdWords, many use the Google Partners badge to imply ?they are certified by Google for SEO and organic ranking purposes. ?That said, most of those who are reading the How Search Works probably are already aware of this. ?But nice to see Google add this in writing again, for times when SEOs need to prove to clients that there is not a “pay to win” option, outside of AdWords, or simply to show someone who might be falling for some scammy SEO company’s claims of Google rankings. 狡猾的SEO公司長期以來一直要保證在Google上的排名第一,提高排名或需要付費才能將網站提交給Google一直是一個問題。而且由于AdWords的Google合作伙伴徽章含糊不清,許多人都使用Google合作伙伴徽章來暗示他們已通過Google的SEO和自然排名認證。就是說,大多數閱讀“搜索方式”的人可能已經意識到這一點。但是很高興看到Google再次以書面形式添加此內容,有時SEO需要向客戶證明除了AdWords之外沒有“支付即贏”的選擇,或者只是向那些可能會欺騙某些SEO公司的人展示 Google排名的聲明。


      The Long Version


      Google then gets into what they call the “long version” of How Google Search Works, with more details on the above sections, covering more nuances that impact SEO. 然后,Google進入所謂的“ Google搜索的工作方式”的“長篇版”,在上述部分中提供了更多詳細信息,涵蓋了影響SEO的更多細微差別。


      Crawling


      Google has changed how they refer to the “algorithmic process”. ?Previously, it stated “Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often and how many pages to fetch from each site.”? Curiously, they removed the reference to “computer programs”, which provoked the question about which computer programs exactly Google was using. Google改變了他們對“算法過程”的稱呼。以前,它說過“ Googlebot使用算法過程:計算機程序確定要爬網的站點,從每個站點獲取的頻率和數量。” 奇怪的是,他們刪除了對“計算機程序”的引用,這引發了關于Google確切使用的計算機程序的問題。


      The new updated version simply states:


      Googlebot uses an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site. Googlebot使用算法過程來確定要爬網的站點,從每個站點獲取的頻率以及獲取的頁面數量。


      Google also updated the wording for the crawl process, changing that it is “augmented with sitemap data” to “augmented by sitemap” data. Google還更新了抓取過程的用語,將其由“通過站點地圖數據增強”改為“通過站點地圖增強數據”。


      Google also made a change where it referenced that Googlebot “detects” links and changed it to “finds” links, as well as changes from Googlebot visiting “each of these websites” to the much more specific “page”. ?This second change makes it more accurate since Google visiting a website won’t necessarily mean it crawls all links on all pages. ?The change to “page” makes it more accurate and specific for webmasters. ?Google還進行了更改,引用Googlebot“檢測”鏈接并將其更改為“查找”鏈接,以及從Googlebot訪問“每個網站”到更具體的“頁面”的更改。由于谷歌訪問網站并不一定意味著它會爬網所有頁面上的所有鏈接,因此第二個更改使其更加準確。對“頁面”的更改使它更加準確和針對網站管理員。


      Previously it read:


      As Googlebot visits each of these websites it detects links on each page and adds them to its list of pages to crawl. 當Googlebot訪問這些網站中的每個網站時,它會檢測每個頁面上的鏈接,并將它們添加到要抓取的頁面列表中。


      Now it reads:


      When Googlebot visits a page it finds links on the page and adds them to its list of pages to crawl. 當Googlebot訪問頁面時,它會找到頁面上的鏈接并將其添加到要爬網的頁面列表中。


      Google has added a new section about using Chrome to crawl:

      During the crawl, Google renders the page using a recent version of Chrome. As part of the rendering process, it runs any page scripts it finds. If your site uses dynamically-generated content, be sure that you?follow the JavaScript SEO basics. 在抓取過程中,Google使用最新版本的Chrome渲染頁面。作為渲染過程的一部分,它將運行找到的所有頁面腳本。如果您的網站使用動態生成的內容,請確保遵循JavaScript SEO基礎知識。


      By referencing a recent version of Chrome, this addition is clarifying the change from last year where Googlebot was finally upgraded to the latest version of Chromium for crawling, an update from Google only crawling with Chrome 41 for years. 通過引用最新版本的Chrome,此次添加可以明確說明自去年Googlebot最終升級到最新版本的Chromium進行爬網以來所做的更改,這是Google多年來僅使用Chrome 41進行爬網的更新。


      Google also notes it runs “any page scripts it finds,” and advises site owners to be aware of possible crawl issues as a result of using dynamically-generated content with the use of JavaScript, specifying that site owners should ensure they follow their JavaScript SEO basics. Google還指出,它會運行“找到的所有頁面腳本”,并建議網站所有者注意使用動態生成的內容和JavaScript所導致的爬網問題,并指定網站所有者應確保他們遵循其JavaScript SEO 基本。


      Google also details the primary and secondary crawls, something that has garnered much confusion since Google revealed primary and secondary crawls, but Google’s details in this How Google Search Works documents detail it differently than how some SEOs previously interpreted it. Google還詳細介紹了主要爬網和輔助爬網,自從Google揭示了主要爬網和輔助爬網以來,這引起了很大的混亂,但是Google在“ Google搜索方式”文檔中的詳細信息與某些SEO先前對它的解釋不同。


      Here is the entire new section for primary and secondary crawls:?這是主要和輔助爬網的整個新部分:


      Primary crawl /?secondary crawl


      Google uses two different crawlers for crawling websites: a mobile crawler and a desktop crawler. Each crawler type simulates a user visiting your page with a device of that type. Google使用兩種不同的搜尋器來搜尋網站:移動搜尋器和桌面搜尋器。每種搜尋器類型都模擬一個用戶使用該類型的設備訪問您的網頁。


      Google uses one crawler type (mobile or desktop) as the primary crawler for your site. All pages on your site that are crawled by Google are crawled using the primary crawler. The primary crawler for all new websites is the mobile crawler. Google使用一種爬蟲類型(移動或桌面)作為您網站的主要爬蟲。Google抓取的網站上的所有頁面都是使用主抓取工具抓取的。所有新網站的主要搜尋器是移動搜尋器。


      In addition, Google recrawls a few pages on your site with the other crawler type (mobile or desktop). This is called the secondary crawl, and is done to see how well your site works with the other device type. 此外,Google會使用其他搜尋器類型(移動設備或臺式機)來檢索您網站上的幾個頁面。這稱為第二次爬網,用于查看您的站點在其他設備類型下的運行情況。


      In this section, Google refers to primary and secondary crawls as being specific to their two crawlers – the mobile crawler and the desktop crawler. ?Many SEOs think of primary and secondary crawling in reference to Googlebot making two passes over a page, where javascript is rendered on the secondary crawl. ?So while Google clarifies their use of desktop and mobile Googlebots, the use of language here does cause confusion for those who use this to refer to the primary and secondary crawls for javascript purposes. ?So to be clear, Google’s reference to their primary and secondary crawl has nothing to do with javascript rendering, but only to how they use both mobile and desktop Googlebots to crawl and check a page. Google在本節中將主要和輔助爬網稱為特定于它們的兩個爬網程序-移動爬網程序和桌面爬網程序。許多SEO會參考Googlebot在頁面上進行兩次傳遞來考慮主要和輔助爬網,在頁面上會在輔助爬網上呈現javascript。因此,盡管Google澄清了他們對臺式機和移動Googlebot的使用,但此處的語言使用確實使使用JavaScript來指代主要和次要抓取的人感到困惑。需要明確的是,Google提及其主抓取和輔助抓取與javascript渲染無關,而僅涉及他們如何使用移動和桌面Googlebots抓取和檢查頁面。


      What Google is clarifying in this specific reference to primary and secondary crawl is that Google is using two crawlers – both mobile and desktop versions of Googlebot – and will crawl sites using a combination of both. Google在此特定參考中對主爬網和輔助爬網的澄清是,谷歌正在使用兩個爬網程序(包括Googlebot的移動版和臺式機版本),并將結合使用這兩個爬網程序來爬網站。


      Google did specifically state that new websites are crawled with the mobile crawler in their “Mobile-First Indexing Best Practices” document, as of July 2019. ?But this is the first time it has made an appearance in their How Google Search Works document. 谷歌確實在2019年7月的``移動優先索引最佳實踐''文檔中特別聲明了使用移動爬網程序對新網站進行爬網。


      Google does go into more detail about how it uses both the desktop and mobile Googlebots, particularly for sites that are currently considered mobile first by Google. ?It wasn’t clear just how much Google was checking desktop versions of sites if they were mobile first, and there have been some who have tried to take advantage of this by presenting a spammier version to desktop users, or in some cases completely different content. ?But Google is confirming it is still checking the alternate version of the page with their crawlers. Google確實更詳細地介紹了它如何同時使用臺式機和移動Googlebot,尤其是對于那些目前被Google視為移動優先的網站。尚不清楚Google是否首先檢查了桌面版本的網站是否是移動設備,并且有一些人試圖通過向桌面用戶提供垃圾郵件版本來利用此功能,或者在某些情況下,它們會提供完全不同的內容 。但是Google確認仍在與其抓取工具一起檢查網頁的備用版本。


      So sites that are mobile first will see some of their pages crawled with the desktop crawler. ?However, it still isn’t clear how Google handles cases where they are vastly different, especially when done for spam reasons, as there doesn’t seem to be any penalty for doing so, aside from a possible spam manual action if it is checked or a spam report is submitted. ?And this would have been a perfect opportunity to be clearer about how Google will handle pages with vastly different content depending on whether it is viewed on desktop or on mobile. ?Even in the mobile friendly documents, Google only warns about ranking differences if content is on the desktop version of the page but is missing on the mobile version of the page. 因此,最先移動的網站將看到其某些網頁已通過桌面搜尋器進行了搜尋。但是,目前尚不清楚Google如何處理截然不同的情況,尤其是在由于垃圾郵件原因而處理的情況下,因為除了檢查可能的垃圾郵件手動操作外,這樣做似乎沒有任何懲罰。或提交垃圾郵件報告。這將是一個絕佳的機會,可以使您更清楚地了解Google將如何處理在桌面上或在移動設備上查看的內容完全不同的頁面。即使在適合移動設備的文檔中,如果內容在頁面的桌面版本上,但在頁面的移動版本上缺少,則Google僅警告等級差異。


      How does Google find a page?


      Google has removed this section entirely from the new version of the document. Google已從文檔的新版本中完全刪除了此部分。


      Here is what was included in it:


      How does Google find a page?


      Google uses many techniques to find a page, including:

      ·?Following links from other sites or pages

      ·?Reading sitemaps


      It isn’t clear why Google removed this specifically. ?It is slightly redundant, but it was missing the submitting a URL option as well. 尚不清楚Google為什么專門刪除此內容。它有點多余,但是也缺少提交URL選項。


      Improving Your Crawling


      Google makes the use of hreflang a bit clearer, especially for those who might just be learning what hreflang is and how it works by providing a bit more detail. Google使hreflang的使用更加清晰,特別是對于那些可能只是通過提供更多細節來了解hreflang是什么以及它如何工作的人。


      Formerly it said “Use hreflang to point to alternate language pages.”? Now it states “Use hreflang to point to alternate versions of your page in other languages.”?以前它說“使用hreflang指向備用語言頁面。” 現在,它顯示“使用hreflang指向其他語言的頁面備用版本。”


      Not a huge change, but a bit clearer. 變化不大,但更清晰。


      Google has also added two new points, providing more detail about ensuring Googlebot is able to access all the content on the page, not just the content (words) specifically. Google還增加了兩個新點,提供了更多有關確保Googlebot能夠訪問頁面上所有內容的詳細信息,而不僅僅是特定的內容(單詞)。


      First, Google added:


      Be sure that Google can access the key pages, and also the important resources (images, CSS files, scripts) needed to render the page properly. 確保Google可以訪問關鍵頁面以及正確呈現頁面所需的重要資源(圖像,CSS文件,腳本)。


      So Google is stressing about ensuring Google can access all the important content. ?And it is also specifically calling attention to other types of elements on the page that Google wants to also have access to in order to properly crawl the page, including images, CSS and scripts. ?For those webmasters who went through the whole “mobile first indexing” launch, they are fairly familiar with issues surrounding blocking files, especially CSS and scripts, something that some CMS had blocked Googlebot from crawling by default. ?因此,Google強調要確保Google可以訪問所有重要內容。而且,它還特別要引起人們注意頁面上Google希望也可以訪問的其他類型的元素,以便正確地爬行頁面,包括圖像,CSS和腳本。對于那些經歷了整個“移動優先索引”發布的網站管理員,他們非常熟悉圍繞阻止文件(尤其是CSS和腳本)的問題,某些CMS默認已阻止了Googlebot進行爬網。


      But for newer site owners, they might not realize this was possible, or that they might be doing it. ?It would have been nice to see Google add specific information on how those newer to SEO can check for this, particularly for those who also might not be clear on what exactly “rendering” means. 但是對于新的網站所有者,他們可能沒有意識到這是可能的,或者他們可能正在這樣做。很高興看到Google添加有關SEO的新手如何檢查的特定信息,特別是對于那些可能還不清楚“渲染”到底是什么的人。


      Google also added:


      Confirm that Google can access and render your page properly by running the?URL Inspection tool?on the live page. 通過運行實時頁面上的URL檢查工具,確認Google可以正確訪問和呈現您的頁面。


      Here Google does add specific information about using the URL Inspection tool in order to see what site owners are blocking or content that is causing issues when Google tries to render it. ?I think these last two new points could have been combined, and made slightly clearer for how site owners can use the tool to check for all these issues. Google確實在此處添加了有關使用URL檢查工具的特定信息,以便查看哪些站點所有者正在阻止哪些內容,或者哪些內容導致了在Google嘗試呈現該問題時引起問題的內容。我認為可以將這最后兩個點結合起來,使站點所有者可以使用該工具檢查所有這些問題的方式更加清晰。


      Indexing


      Google has made significant changes to this section as well. And Google starts off with making major changes to the first paragraph. ?Here is the original version: Google也對該部分進行了重大更改。Google首先對第一段進行了重大更改。這是原始版本:


      Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, we process information included in key content tags and attributes, such as <title> tags and alt attributes.


      The updated version now reads:


      Googlebot processes each page it crawls in order to understand the content of the page. This includes processing the textual content, key content tags and attributes, such as <title> tags and alt attributes, images, videos, and more. Googlebot處理其抓取的每個頁面,以了解該頁面的內容。這包括處理文本內容,關鍵內容標簽和屬性,例如<title>標簽和alt屬性,圖像,視頻等。


      Google no longer states it processes pages to “compile a massive index of all the words it sees and their location on each page.”? This was always a curious way for them to call attention to the fact they are simply indexing all words it comes across and their position on a page, when in reality it is a lot more complex than that. ?So it definitely clears that up. Google不再聲明將頁面處理為“對所看到的所有單詞及其在每個頁面上的位置進行大規模索引編制”。對于他們來說,這總是一種奇怪的方式來引起人們注意,他們只是索引所有碰到的單詞以及它們在頁面上的位置,而實際上這要復雜得多。因此,它肯定可以解決這一問題。


      They have also added that they are processing “textual content” which is basically calling attention to the fact it indexes the words on the page, something that was assumed by everyone. ?But it does differentiate between the new addition later in the paragraph regarding images, videos and more. 他們還補充說,他們正在處理“文本內容”,這基本上是在引起人們注意它為頁面上的單詞建立索引的事實,這是每個人都假定的。但這確實區分了本段后面有關圖像,視頻等的新增內容。


      Previously, Google simply made reference to attributes such as title and alt tags and attributes. ?But now it is getting more granular, specifically referring to “images, videos and more.”? However, this does mean Google is considering images, videos and “more” to understand the content on the page, which could affect rankings. ?以前,Google僅引用諸如標題和alt標簽以及屬性之類的屬性。但是現在,它變得越來越細化,特別是指“圖像,視頻等”。但是,這確實意味著Google正在考慮使用圖片,視頻和“更多”內容來理解頁面上的內容,這可能會影響排名。


      Improving your Indexing


      Google changed “read our SEO guide for more tips” to “Read our basic SEO guide and advanced user guide for more tips.” Google將“閱讀我們的SEO指南以獲取更多提示”更改為“閱讀我們的基本SEO指南和高級用戶指南以獲取更多提示。”


      What is a document?


      Google has added a massive section here called “What is a document?”? It talks specifically about how Google determines what is a document, but also includes details about how Google views multiple pages with identical content as a single document, even with different URLs, and how it determines canonicals. Google在此處添加了一個很大的部分,稱為“什么是文檔?” 它專門討論了Google如何確定什么是文檔,還包括有關Google如何查看具有與單個文檔相同的內容(甚至具有不同的URL)的多個頁面以及如何確定規范的詳細信息。


      First, here is the first part of this new section:

      What is a “document”?


      Internally, Google represents the web as an (enormous) set of documents. Each document represents one or more web pages. These pages are either identical or very similar, but are essentially the same content, reachable by different URLs. The different URLs in a document can lead to exactly the same page (for instance, example.com/dresses/summer/1234 and example.com?product=1234 might show the same page), or the same page with small variations intended for users on different devices (for example, example.com/mypage for desktop users and m.example.com/mypage for mobile users). 在內部,Google將網絡表示為一組(巨大的)文檔。每個文檔代表一個或多個網頁。這些頁面是相同的或非常相似的,但是本質上是相同的內容,可以通過不同的URL訪問。文檔中不同的URL可以導致完全相同的頁面(例如,example.com/dresses/summer/1234和example.com?product=1234可能顯示同一頁面),或者同一頁面具有較小的差異, 用戶(例如,臺式機用戶為example.com/mypage,移動用戶為m.example.com/mypage)。


      Google chooses one of the URLs in a document and defines it as the document’s canonical URL. The document’s canonical URL is the one that Google crawls and indexes most often; the other URLs are considered duplicates or alternates, and may occasionally be crawled, or served according to the user request: for instance, if a document’s canonical URL is the mobile URL, Google will still probably serve the desktop (alternate) URL for users searching on desktop. Google選擇文檔中的一個URL并將其定義為文檔的規范URL。文檔的規范網址是Google最常抓取和編制索引的網址。其他網址則被視為重復網址或備用網址,并可能會根據用戶請求進行抓取或提供:例如,如果文檔的規范網址是移動網址,則Google仍可能會為用戶搜索提供桌面(備用)網址 在桌面上。


      Most reports in Search Console attribute data to the document’s canonical URL. Some tools (such as the Inspect URL tool) support testing alternate URLs, but inspecting the canonical URL should provide information about the alternate URLs as well. Search Console中的大多數報告都將數據歸于文檔的規范網址。某些工具(例如“檢查URL”工具)支持測試備用URL,但是檢查規范URL也應提供有關備用URL的信息。


      You can tell Google which URL you prefer to be canonical, but Google may choose a different canonical for various reasons. 您可以告訴Google您希望使用哪個規范網址,但是出于各種原因,Google可能會選擇其他規范網址。


      So the tl:dr is that Google will view pages with identical ?or near-identical content as the same document, regardless of how many of them there are. ?For seasoned SEOs, we know this as internal duplicate content. 因此tl:dr是Google將查看與同一文檔具有相同或幾乎相同內容的頁面,而不管其中有多少頁面。對于經驗豐富的SEO,我們將其稱為內部重復內容。


      Google also states that when Google determines these duplicates, they may not be crawled as often. ?This is important to note for site owners that are working to de-duplicate content which Google is considering duplicate. ?So it would be more important to submit these URLs to be recrawled, or give those newly de-duplicated pages links from the homepage in order to ensure Google recrawls and indexed the new content, so Google de-dupes them properly. Google還聲明,當Google確定這些重復項時,它們可能不會被頻繁檢索。對于正在努力重復刪除Google認為重復的內容的網站所有者,這一點很重要。因此,更重要的是提交要重新爬網的這些URL,或從首頁提供那些新刪除重復的頁面的鏈接,以確保Google重新抓取并索引新內容,以便Google正確地對它們進行重復刪除。


      It also brings up an important note about desktop versus mobile, that Google will still likely serve the desktop version of a page instead of the mobile version for desktop users, when a site has two different URLs for the same page where is designed for mobile users and the other for desktop. ?While many websites have changed to serving the same URL and content for both using responsive design, some sites still run two completely different sites and URLs for desktop and mobile users. 它還提出了有關臺式機與移動設備的重要說明,即當網站為移動用戶設計的同一頁面有兩個不同的URL時,Google仍可能會為頁面用戶提供頁面的桌面版本,而不是面向桌面用戶的移動版本。另一個用于桌面。盡管許多網站已更改為使用響應式設計為兩個網站提供相同的URL和內容,但某些網站仍為臺式機和移動用戶運行兩個完全不同的網站和URL。


      Google also mentions that you can tell Google the URL you prefer Google to use as the canonical, but states they can chose a different URL “for various reasons.”? While Google doesn’t detail specifics about why Google might choose a different canonical than the one the site owner specifies, it is usually due to http vs https, if a page is included in a sitemap or not, page quality, if the pages appear to be completely different and should not be canonicalized, or due to significant incoming links to the non-canonical URL. Google還提到您可以告訴Google您希望Google用作規范的URL,但指出“出于各種原因”,他們可以選擇其他URL。盡管Google并未詳細說明為何Google選擇與網站所有者指定的規范不同的規范,但這通常是由于http vs https造成的,如果站點地圖中是否包含某個頁面,則頁面質量,如果頁面出現 完全不同且不應規范化,或由于大量輸入到非規范URL的鏈接。


      Google has also included definitions for many o the terms used by SEOs and in Google Search Console. Google還為SEO和Google Search Console中的許多術語提供了定義。


      Document: A collection of similar pages. Has a canonical URL, and possibly alternate URLs, if your site has duplicate pages. URLs in the document can be from the same or different organization (the root domain, for example “google” in www.google.com). Google chooses the best URL to show in Search results according to the platform (mobile/desktop), user language? or location, and many other variables. Google discovers related pages on your site by organic crawling, or by site-implemented features such as redirects or <link rel=alternate/canonical> tags. Related pages on other organizations can only be marked as alternates if explicitly coded by your site (through redirects or link tags). 文件:相似頁面的集合。如果您的網站有重復的頁面,則有一個規范的URL,并可能有備用URL。文檔中的URL可以來自相同或不同的組織(根域,例如www.google.com中的“ google”)。Google會根據平臺(移動/臺式機),用戶語言?或位置以及許多其他變量,選擇在搜索結果中顯示的最佳URL。Google通過自然爬網或網站實現的功能(例如重定向或<link rel = alternate / canonical>標簽)發現您網站上的相關頁面。如果您的站點(通過重定向或鏈接標記)明確編碼,則其他組織上的相關頁面只能標記為替代頁面。


      Again, Google is talking about the fact a single document can encompass more than just a single URL, as Google will consider a single document to potentially have many duplicate or near duplicate pages as well as pages assigned via canonical. ?Google makes specific mention about “alternates” that appear on other sites, that can only be considered alternates if the site owner specifically codes it. ?And that Google will choose the best URL from within the collection of documents to show. 再次,Google在談論一個事實,即一個文檔可以包含多個URL,因為Google會認為一個文檔可能包含許多重復或接近重復的頁面以及通過規范分配的頁面。Google特別提到了其他網站上出現的“替代”,只有在網站所有者明確編碼后才能視為替代。而且Google將從顯示的文檔集中選擇最佳的URL。


      But it fails to mention that Google can consider pages duplicate on other sites and will not show those duplicates, even if they aren’t from the same sites, something that site owners see happen frequently when someone steals content and sometimes sees the stolen version ranking over the original. 但是,它沒有提及Google可以認為頁面在其他網站上重復,即使這些頁面不是來自同一網站,也不會顯示這些重復,當某人竊取內容并有時看到被盜版本時,網站所有者經常會看到這種情況 超過原來的。


      There was a notation added for the above, dealing with hreflang. 上面為hreflang添加了一種表示法。


      ?Pages with the same content in different languages are stored in different documents that reference each other using?hreflang tags; this is why it’s important to use hreflang tags for translated content. ?具有不同語言的具有相同內容的頁面存儲在使用hreflang標記相互引用的不同文檔中; 這就是為什么將hreflang標記用于翻譯內容很重要的原因。


      Google shows that it doesn’t include identical content under the same “document” when it is simply in a different language, which is interesting. ?But Google is tressing the importance of using hreflang in these cases. Google表示,僅使用另一種語言,就不會在同一“文檔”下包含相同的內容,這很有趣。但是在這些情況下,Google強調使用hreflang的重要性。


      URL: The URL used to reach a given piece of content on a site. The site might resolve different URLs to the same page. URL:用于訪問網站上給定內容的URL。該網站可能會將不同的URL解析到同一頁面。


      Pretty self explanatory, although it does have reference to the fact different URLs can be resolved to the same page, presumably such as with redirects or alias. 盡管它確實提到了可以將不同的URL解析到同一頁面的事實,這很容易解釋,大概是諸如重定向或別名之類的。


      Page: A given web page, reached by one or more URLs. There can be different versions of a page, depending on the user’s platform (mobile, desktop, tablet, and so on). 頁面:給定的網頁,可以通過一個或多個URL進行訪問。頁面的版本可能不同,具體取決于用戶的平臺(移動設備,臺式機,平板電腦等)。


      Also pretty self explanatory, bringing up the specifics that some site owners can be served different versions of the same page, such as if they try and view the same page on a mobile device versus a desktop computer. 這也很不言自明,提出了可以為某些網站所有者提供同一頁面的不同版本的細節,例如,如果他們嘗試在移動設備而非臺式機上查看同一頁面。


      Version: One variation of the page, typically categorized as “mobile,” “desktop,” and “AMP” (although AMP can itself have mobile and desktop versions). Each version can have a different URL (example.com vs m.example.com) or the same URL (if your site uses dynamic serving or responsive web design, the same URL can show different versions of the same page) depending on your site configuration. Language variations are not considered different versions, but different documents. 版本:頁面的一種變體,通常分為“移動”,“桌面”和“ AMP”(盡管AMP本身可以具有移動和桌面版本)。每個版本可以具有不同的URL(example.com與m.example.com)或相同的URL(如果您的站點使用動態服務或響應式網頁設計,則相同的URL可以顯示同一頁面的不同版本),具體取決于您的站點 組態。語言變體不是不同的版本,而是不同的文檔。


      Simply clarifying with greater details the different versions of a page, and how Google typically categorizes them as “mobile,” “desktop,” and “AMP”. 只需更詳細地說明頁面的不同版本,以及Google通常如何將其分類為“移動”,“桌面”和“ AMP”即可。


      Canonical page or URL: The URL that Google considers as most representative of the document. Google always crawls this URL; duplicate URLs in the document are occasionally crawled as well. 規范頁面或URL:Google認為最能代表文檔的URL。Google始終會抓取該URL;偶爾也會抓取文檔中重復的URL。


      Google states here again that non-canonical pages are not crawled as often as the main canonical that a site owner assigns to a group of pages they want canonical. ?Google does not include specific mention here that they sometimes chose a different page as the canonical one, even if there is a specific page designated as the canonical one. Google在此再次聲明,非規范頁面的抓取頻率不如網站所有者分配給他們想要的規范頁面組的主要規范。Google此處未特別提及他們有時會選擇其他頁面作為規范頁面,即使有特定頁面被指定為規范頁面。


      Alternate/duplicate page or URL: The document URL that Google might occasionally crawl. Google also serves these URLs if they are appropriate to the user and request (for example, an alternate URL for desktop users will be served for desktop requests rather than a canonical mobile URL). 備用/重復頁面或URL:Google可能偶爾抓取的文檔URL。如果這些URL適合用戶和請求,則Google也會提供這些URL(例如,將為桌面用戶提供替代URL,而非桌面移動URL)。


      The key takeaway here is that Google “might” occasionally crawl the site’s duplicate or alternative page. ?And here they stress that Google will serve these alternative URLs “if they are appropriate.”? It is unfortunate they don’t go into greater detail in why they might serve these pages instead of the canonical, outside of the mention of desktop versus mobile, as we have seen many cases where Google picks a different page to show other than the canonical for a myriad of reasons. 這里的主要要點是,Google有時可能會“抓取”該網站的重復頁面或替代頁面。他們在這里強調,Google將“在適當時”提供這些替代網址。不幸的是,除了提及臺式機還是移動設備之外,他們沒有更詳細地說明為什么可以提供這些頁面而不是標準頁面,因為我們已經看到許多情況下Google選擇了不同的頁面來顯示標準頁面以外的內容 由于種種原因。


      Google also fails to mention how this impacts duplicate content found on other sites, we we do know Google will crawl those less often as well. Google也沒有提及這會對其他網站上的重復內容產生怎樣的影響,我們知道Google也將減少對這些內容的檢索。


      Site: Usually used as a synonym for a website (a conceptually related set of web pages), but sometimes used as a synonym for a Search Console property, although a property can actually be defined as only part of a site. A site can span subdomains (and even domains, for properly linked AMP pages). 網站:通常用作網站(概念上相關的一組網頁)的同義詞,但有時也用作Search Console屬性的同義詞,盡管實際上可以將屬性定義為網站的一部分。一個站點可以跨越子域(對于正確鏈接的AMP頁面,甚至可以是域)。


      Interesting to note here what they consider a website – a conceptually related set of webpages – and how it related to the usage of a Google Search Console property, as “a property can actually be defined as only part of a site.”?在這里有趣地指出他們認為網站是什么(概念上相關的一組網頁),以及它與Google Search Console屬性的使用如何相關,因為“屬性實際上只能定義為網站的一部分。”


      Google does make mention that AMP, which technically appear on a different domain, are considered part of the main site. Google確實提到AMP,它在技術上出現在不同的域中,被視為主站點的一部分。


      Serving Results


      Google has made a pretty interesting specific change here in regards to their ranking factors. ?Previously, Google stated: Google在排名方面做了非常有趣的具體更改。谷歌此前曾表示:


      Relevancy is determined by over 200 factors, and we always work on improving our algorithm. 相關性是由200多個因素決定的,我們一直在努力改進算法。


      Google has now updated this “over 200 factors” with a less specific one. 谷歌現在用一個不太具體的因素更新了“ 200多個因素”。


      Relevancy is determined by hundreds of factors, and we always work on improving our algorithm. 相關性由數百個因素決定,我們一直在努力改進算法。


      The 200 factors in the How Google Search Works dates back to 2013 when the document was launched, although then it also made reference to PageRank (“Relevancy is determined by over 200 factors, one of which is the PageRank for a given page”) which Google removed when they redesigned their document in 2018. “ Google搜索的工作方式”中的200個因素可追溯到文檔發布時的2013年,盡管當時它還引用了PageRank(“相關性由200多個因素決定,其中之一是給定頁面的PageRank”) Google在2018年重新設計文檔時將其刪除。


      While Google doesn’t go into specifics on the number anymore, it can be assumed that a significant number of ranking factors have been added since 2013 when this was first claimed in this document. ?But I am sure some SEOs will be disappointed we don’t get a brand new shiny number like “over 500” ranking factors that SEOs can obsess about. 盡管Google不再詳細說明該數字,但可以假設,自2013年在本文檔中首次提出此要求以來,已經添加了大量的排名因素。但是我敢肯定,有些SEO會讓我們感到失望的,因為我們沒有獲得SEO可以關注的全新的閃亮數字,例如“超過500”的排名因素。


      Final Thoughts


      There are some pretty significant changes made to this document that SEOs can get a bit of insight from. SEO可以對此文檔進行一些相當重要的更改,以獲取一些見識。


      Google’s description of what it considers a document and how it relates to other identical or near-identical pages on a site is interesting, as well as Google’s crawling behavior towards the pages within a document it considers as alternate pages. ?While this behavior has often been noted, it is more concrete information on how site owners should handle these duplicate and near-duplicate pages, particularly when they are trying to un-duplicate those pages and see them crawled and indexed as their own document. Google對它認為是什么文檔以及它與網站上其他相同或接近相同頁面之間的關系的描述,以及Google朝著它認為是替代頁面的文檔中頁面的爬行行為,都很有趣。盡管經常會注意到這種行為,但它是有關網站所有者應如何處理這些重復和近乎重復的頁面的更具體的信息,尤其是當他們嘗試取消重復這些頁面并看到它們已作為自己的文檔進行爬網和建立索引時。


      They added a lot of useful advice for newer site owners, which is particularly helpful with so many new websites coming online this year due to the global pandemic. ?Things such as checking a site without being logged in, how to submit both pages and sites to Google, etc. 他們為新站點所有者增加了很多有用的建議,這對于今年由于全球大流行而新上線的眾多新站點尤其有用。諸如檢查未登錄的站點,如何將頁面和站點都提交給Google等之類的事情。


      The mention of what Google considers a “small site” is interesting because it gives a more concrete reference point for how Google sees large versus small sites. ?For some, a small site could mean under 30 pages and the idea of a site with millions of pages being unfathomable. ?And the reinforcement of a strong navigation, even for “small sites” is useful for showing site owners and clients who might push for navigation that is more aesthetic than practical for both usability and SEO. 提及Google認為“小型網站”很有趣,因為它為Google如何看待大型網站與小型網站提供了更為具體的參考點。對于某些人來說,一個小型站點可能意味著不到30頁,而擁有數百萬個頁面的站點的構想卻是深不可測的。而且,即使對于“小型網站”而言,增強的導航功能也很有用,它有助于向可能會要求導航的網站所有者和客戶展示對可用性和SEO而言比實際更美觀的導航。


      The primary and secondary crawl additions will probably cause some confusion for those who think of primary and secondary in terms of how Google processes scripts on a page when it crawls it. ?But it is nice to have more concrete information on how and when Google will crawl using the alternate version of Googlebot for sites that are usually crawled with either the mobile Googlebot or the desktop one。?主要和次要爬網選項可能會使那些想到主要和次要爬網的人感到困惑,因為他們會在Google對其進行爬網時如何處理頁面上的腳本。但是,對于通常使用移動Googlebot或臺式機爬網的網站,有更具體的信息來說明如何使用Googlebot的替代版本以及何時爬網Google會更好。


      Lastly, the change from the “200 ranking factors” to a less specific, but presumably much higher number of ranking factors will disappoint some SEOs who liked having some kind of specific number of potential ranking factors to work out. 最后,從“ 200個排名因素”到不太具體但可能更高的排名因素的變化將使一些SEO失望,他們喜歡使用某種特定數量的潛在排名因素來解決。


      ?來源:JENNIFER SLEGG



      點贊(2) 打賞

      評論列表 共有 0 條評論

      暫無評論

      服務號

      訂閱號

      備注【拉群】

      商務洽談

      微信聯系站長

      發表
      評論
      立即
      投稿
      返回
      頂部