User-agent: * # the following directives are for all spiders/crawlers, but primarily for Google's crawlers. Rules have been verified against Googlebot ### ALLOWS ### # CSS/JS, image & other misc. files - allow these to be crawled, so that site contents & layout are rendered (by Googlebot) in as similar a way as full browsers # IMPORTANT: Update the following allow rules to be more specific, if more (specific) disallow rules are added in the future, and vice-versa Allow: /*.*css Allow: /*.*js Allow: /*.*gif Allow: /*.*png Allow: /*.*jp* Allow: /*mages # *Images Allow: /*.*ashx Allow: /*.*ico # CSS/JS, image & other misc. files under SharePoint system URLs (_layouts, _catalogs, etc.) - allow these to be crawled, so that site contents & layout are rendered (by Googlebot) in as similar a way as full browsers # NOTE: a few of the rules below are needed as more specific rules (longer string wins), to counter some disallow rules (if specified further below) Allow: /*_*.*css Allow: /*_*.*js Allow: /*_*.*gif Allow: /*_*.*png Allow: /*_*.*jp* Allow: /*_*mages # *Images Allow: /*_*.*ashx Allow: /*_*.*ico # CAS Allow: /*content-sub-site/*ocuments Allow: /content-sub-site/Documents ### DISALLOWS ### # SP FS # File System entries for SharePoint 2010 & 2013 Disallow: /*_lay # Shorter Disallow rule to ensure that the longer Allow rule wins in a conflict # SP CDB # Content Database entries for SharePoint 2010 & 2013 Disallow: /*_cat # Shorter Disallow rule to ensure that the longer Allow rule wins in a conflict Disallow: /*_cts Disallow: /*_private Disallow: /m/ # / at end included to force rule to disallow crawling of only '/m/' directory Disallow: /*eusable*ontent # ReusableContent Disallow: /*ite*ssets # SiteAssets Disallow: /*orkflow # WorkFlow Disallow: /*Lists/ Disallow: /*lists/ # / at end included to force rule to disallow crawling of only those URLs that have a / after the word 'lists' Disallow: /*ocuments*/*orms # Documents/Forms Disallow: /*ages*/*orms # Images/Forms, Pages/Forms # SP /FORMS/ - commonly-found files under SharePoint's /*Forms/ directory that fall outside the scope of the other defined rules for /Forms/ under Documents, Images, Pages libraries Disallow: /*Combine.aspx Disallow: /*combine.aspx Disallow: /*ll*tems.aspx # AllItems.aspx (in Lists and Libraries) Disallow: /*isp*orm.aspx # DispForm.aspx Disallow: /*dit*orm.aspx # EditForm.aspx Disallow: /*ew*orm.aspx # NewForm.aspx Disallow: /*repair.aspx Disallow: /*humb*ails.aspx # Thumbnails.aspx Disallow: /*Upload.aspx Disallow: /*upload.aspx # CALENDAR - some calendar-related entries to help avoid infinite crawling loops Disallow: /*alendar*ervice.ashx # CalendarService.ashx (Kwizcom Calendar webpart) # Disallow: /*alendar/ # Calendar # CAS Disallow: /*cas-it Disallow: /*content-sub-site # SP - sample of more specific URLs to exclude, for SharePoint (internal search) testing Disallow: /Lists/ Disallow: /cas-it/ Disallow: /cas-it-sub-site/ Disallow: /content-sub-site/ # This file created & maintained by Mihir Sheth