Information Leakage: Protect Against "Google Hacking
Thankfully, the same tools and techniques that would-be attackers can use to gather information about your network and find vulnerable files and data, you can also use to preemptively discover the weak points in your web security and make sure that any such data is properly protected.
There are a plethora of specific search statements that can be used to target specific data or file types that might contain confidential or private information. Here are some examples of Google search syntax that can be found on at http://johnny.ihackstuff.com, created by the lead author of Google Hacking:
- Find Microsoft Excel files that contain login names and passwords
- "login: *" "password: *" filetype:xls
- Locate passwords in plain text found in exposed log files
- "your password is" filetype:log
- Discover insecure instances of the phpMyAdmin database frontend
- "Welcome to phpMyAdmin" " Create new database"
- Find human resources web sites from the internal intranet which are accessible to external users
- intitle:intranet inurl:intranet +intext:"human resources"
- Search for private contact lists that have been synced up from PDA?s or cell phones
- contacts ext:wml
This is just a very small sampling of the types of search engine queries you can use to assess the security of your web presence and determine what sorts of sensitive or classified information are available to unauthorized 3rd-parties via the Web.
Executing hundreds of individual searches may be a bit daunting. Fortunately, there are a number of tools available that help to automate this process and quickly perform a risk assessment of the data available via the web from a given domain.
Some search automation tools, such as Gooscan (http://johnny.ihackstuff.com), developed by Google Hacking author Johnny Long, and Athena (http://snakeoillabs.com) do not use the Google API and scanning with them is technically a violation of the Google terms of service which could result in having your IP address or IP range banned from using Google.
Two tools written for Microsoft Windows which do use the Google API are SiteDigger from Foundstone (http://www.foundstone.com), now a division of McAfee, and Wikto from Sensepost (http://www.sensepost.com). These tools do not violate the Google terms of service and each provides a more graphical interface than Gooscan or Athena. Both tools are also capable of utilizing the Google Hacking Database (GHDB) of Google search queries maintained at the Johnny.ihackstuff.com web site.