robots.txt Detected
- CWE 425
- WASC 34
Exposed robots.txt files occur when web crawlers or users can access the robots.txt file that defines crawling rules for search engines. While intended to guide search engine bots, robots.txt may inadvertently disclose sensitive paths, directories, or resources that developers wish to hide from indexing. Attackers can use the file to identify hidden administrative pages, backup directories, or other sensitive locations, facilitating reconnaissance and targeted attacks.
Common patterns leading to robots.txt exposure:
- Listing sensitive directories or files in
Disallowdirectives (e.g.,/admin,/backup,/config). - Using
robots.txtto hide test environments, staging servers, or temporary resources. - Allowing public access to files containing URLs or paths that should remain confidential.
- Misunderstanding that
robots.txtis advisory for bots, not a security control.
Impacts:
- Information Disclosure: Reveals internal paths, hidden directories, or sensitive resources.
- Facilitates Reconnaissance: Attackers gain a roadmap of application structure for enumeration.
- Increased Attack Surface: Knowledge of hidden resources aids exploitation such as file inclusion, backup file access, or administrative page attacks.
- False Sense of Security: Developers may rely on
robots.txtto protect sensitive resources, which does not prevent manual access.
Detection indicators:
- Publicly accessible
robots.txtlisting sensitive paths. - Security scanners flagging known sensitive directories in
robots.txt. - Discovery of unprotected resources referenced in
robots.txt.
Remediation
Mitigation focuses on limiting sensitive data exposure and proper access control:
Do Not Include Sensitive Directories or Files
Avoid listing sensitive paths inrobots.txt. Use access controls to protect them instead.Implement Authentication and Authorization
Secure administrative or confidential directories regardless ofrobots.txtentries.Use
robots.txtOnly for Crawling Guidance
Limit its use to non-sensitive pages meant to control search engine indexing.Monitor and Audit Access
Track access attempts to sensitive directories and resources referenced inrobots.txt.Review Deployment Pipelines
Ensure that no sensitive paths are inadvertently exposed inrobots.txtduring deployment.Security Testing
Include checks for sensitive information disclosure viarobots.txtin penetration tests and automated scans.Educate Developers
Clarify thatrobots.txtis not a security mechanism and should not be relied upon to hide sensitive data.
References
Search Vulnerability
You may also see
- Readable .htaccess file
- apc.php page
- Webalizer script
- phpinfo page
- Apache perl-status Enabled
- Apache server-info Enabled
- Apache server-status Enabled
- JetBrains .idea project directory
- AWStats script
- elmah.axd Detected
- Core dump checker PHP script
- trace.axd Detected
- .DS_Store file
- Macromedia Dreamweaver database scripts
- Help file
- robots.txt Detected
- Sitemap Detected
- crossdomain.xml Detected
- Silverlight Client Access Policy
- Laravel log file
- Code Repository
- Configuration File
- Administration page
- Predictable Resource Location
- Code Repository
- Configuration File
- Administration page