W3C HTML batch Validator in Python
what is it
A script calling the W3C HTML Validator in batch mode. Adapted from a
extension to Python's httplib module, originally from a
Python Cookbook recipe.
The version used and available here is updated to use
available from Python 2.0 which also uses HTTP/1.1 (but does not work with Python versions below
validate.zip v1.7 090620, a complete example which should run unmodified.
- exampleconfig.txt (you may need to adapt the line endings to your OS)
v1.4 040908 (replace .txt with .py extension)
This is a modified version of the script which uses "text/html" as the default content-type if
mimetypesis unable to recognize it. This way it is possible to validate webpages like ".shtml", ".jsp", ".php" or ".py". In these cases you can only use the
UPLOADFROMURL=1is also specified (from v1.6). The script fetches the files from the local
LOCALSERVERURLin this case.
> python validate.py exampleconfig.txt
Tested with Python 2.6.2 on Vista only.
In the config file which is basically a Python file in which some variables are (re-)defined. You may specify the following options (may not be the most elegant and save way but does work well for this short script). All options are optional and have reasonable default values (apart from your website specific URL and Path information of course).
VALIDATORURL = w3
URL to the validator, default is constant
w3 = 'validator.w3.org:80'
You may want to use the URL to a locally installed copy of the W3C validator.
FILESERVERROOT = 'e:\\files'
Absolute path to your local fileroot, Unix: use /, Win use \\ separator.
The script looks here for files to validate.
VALIDATEPATH = 'home'
Validate all files starting from this path in
SKIPPATHS = ['include', 'WEB-INF']
- Skip all files and subdirectories in these paths.
EXTS = ['html', 'htm']
- Validate files with these extensions.
LOCALSERVERURL = 'http://example.org'
UPLOAD=0the validator GETs the files to validate from this URL. For local servers you might need to use the
UPLOAD=1option in which case this URL is not used at all.
=1POST upload from
FILESERVERROOT\VALIDATEPATH) files or if
UPLOAD=1(POST) the HTML to validate will be fetched from
LOCALSERVERURL, else from
FILESERVERROOT\VALIDATEPATH. Files to be fetched will always be the files in
REPORTDIR = '__validator'
- Reports are saved in this directory
OPENREPORTS = 0
- If =1 automatically open report pages in default HTML viewer (normally a webbrowser).
Also see the comments in the files which should be enough to get you started.
- v1.7 090620
- BUGFIX: checks for Valid or Invalid adapted to changes of W3C HTML Validator HTML (the check is really naive!)
- BUGFIX: fixed saving of reports
- improvement of output
- added valid and invalid example HTML
- v1.6 060606
UPLOADFROMURLto retrieve files from e.g. a local server which the w3 validator may not be able to access. Useful if dynamic pages like .php or .jsp pages need to be validated.
- v1.5 040910
- bugfix: reported number of files validated was always 1
- v1.4 040908
modified httplib_multipart to use "text/html" as default content-type
- v1.3 040907
VALIDATEPATHto validate only parts of a website
renamed most options to more meaningful names
- v1.2 040906
httplib_multipart rewrite to use
rewrote most of the script
- v1.1 040906 not released
- upload on local validator works now (because of HTTP/1.1)
- v1.0 040903 not released
- first working version