W3C HTML batch Validator in Python
what is it
A script calling the W3C HTML Validator in batch mode. Adapted from a
Perl version.
Needs the
httplib_multipart
extension to Python's httplib module, originally from a
Python Cookbook recipe.
The version used and available here is updated to use
httplib.HTTPConnection
available from Python 2.0 which also uses HTTP/1.1 (but does not work with Python versions below
2.0).
download
example package
validate.zip v1.7 090620, a complete example which should run unmodified.
individual files
- validate.py
- exampleconfig.txt (you may need to adapt the line endings to your OS)
-
httplib_multipart.py
v1.4 040908 (replace .txt with .py extension)
This is a modified version of the script which uses "text/html" as the default content-type ifmimetypes
is unable to recognize it. This way it is possible to validate webpages like ".shtml", ".jsp", ".php" or ".py". In these cases you can only use theUPLOAD=1
option ifUPLOADFROMURL=1
is also specified (from v1.6). The script fetches the files from the localLOCALSERVERURL
in this case.
usage
> python validate.py exampleconfig.txt
Tested with Python 2.6.2 on Vista only.
config options
In the config file which is basically a Python file in which some variables are (re-)defined. You may specify the following options (may not be the most elegant and save way but does work well for this short script). All options are optional and have reasonable default values (apart from your website specific URL and Path information of course).
-
VALIDATORURL = w3
-
URL to the validator, default is constant
w3 = 'validator.w3.org:80'
You may want to use the URL to a locally installed copy of the W3C validator. -
FILESERVERROOT = 'e:\\files'
-
Absolute path to your local fileroot, Unix: use /, Win use \\ separator.
The script looks here for files to validate. -
VALIDATEPATH = 'home'
-
Validate all files starting from this path in
FILESERVERROOT
. -
SKIPPATHS = ['include', 'WEB-INF']
- Skip all files and subdirectories in these paths.
-
EXTS = ['html', 'htm']
- Validate files with these extensions.
-
LOCALSERVERURL = 'http://example.org'
-
If
UPLOAD=0
the validator GETs the files to validate from this URL. For local servers you might need to use theUPLOAD=1
option in which case this URL is not used at all. -
UPLOAD=1
-
If
=1
POST upload fromFILESERVERROOT\VALIDATEPATH
) files or if=0
GET (fromLOCALSERVERURL/VALIDATEPATH
) pages. -
UPLOADFROMURL=1
-
If
UPLOAD=1
(POST) the HTML to validate will be fetched fromLOCALSERVERURL
, else fromFILESERVERROOT\VALIDATEPATH
. Files to be fetched will always be the files inFILESERVERROOT\VALIDATEPATH
. -
REPORTDIR = '__validator'
- Reports are saved in this directory
-
OPENREPORTS = 0
- If =1 automatically open report pages in default HTML viewer (normally a webbrowser).
Also see the comments in the files which should be enough to get you started.
change history
- v1.7 090620
-
- BUGFIX: checks for Valid or Invalid adapted to changes of W3C HTML Validator HTML (the check is really naive!)
- BUGFIX: fixed saving of reports
- improvement of output
- added valid and invalid example HTML
- v1.6 060606
-
new option
UPLOADFROMURL
to retrieve files from e.g. a local server which the w3 validator may not be able to access. Useful if dynamic pages like .php or .jsp pages need to be validated. - v1.5 040910
- bugfix: reported number of files validated was always 1
- v1.4 040908
-
added option
SKIPPATHS
modified httplib_multipart to use "text/html" as default content-type - v1.3 040907
-
added option
VALIDATEPATH
to validate only parts of a website
renamed most options to more meaningful names - v1.2 040906
-
httplib_multipart rewrite to use
httplib.HTTPConnection
rewrote most of the script - v1.1 040906 not released
- upload on local validator works now (because of HTTP/1.1)
- v1.0 040903 not released
- first working version