Blame: lib/parse/html.py - sqlmapproject/sqlmap

Automatic SQL injection and database takeover tool

36950 0 0 Python

Last preparations for DREI 2019-05-08 12:47:52 +02:00			`#!/usr/bin/env python`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00
			`"""`
Year bump 2025-01-02 00:51:30 +01:00			`Copyright (c) 2006-2025 sqlmap developers (https://sqlmap.org/)`
Replacing doc/COPYING to LICENSE 2017-10-11 14:50:46 +02:00			`See the file 'LICENSE' for copying permission`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00			`"""`

			`import re`

			`from xml.sax.handler import ContentHandler`

Minor update regarding #3129 2018-06-01 10:21:59 +02:00			`from lib.core.common import urldecode`
some code refactoring 2010-04-16 19:57:00 +00:00			`from lib.core.common import parseXmlFile`
Minor layout adjustments, minor fixes and updated changelog 2008-11-17 00:00:54 +00:00			`from lib.core.data import kb`
			`from lib.core.data import paths`
Another patch regarding #4530 2021-01-07 14:20:03 +01:00			`from lib.core.settings import HEURISTIC_PAGE_SIZE_THRESHOLD`
fix for one of those more complex bugs (comparison was returning None while original page and/or page template were already had already DBMS error inside) 2010-12-24 12:13:48 +00:00			`from lib.core.threads import getCurrentThreadData`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00
Minor style update (capitalization of leftover class names) 2012-12-06 13:46:24 +01:00			`class HTMLHandler(ContentHandler):`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00			`"""`
			`This class defines methods to parse the input HTML page to`
			`fingerprint the back-end database management system`
			`"""`

			`def __init__(self, page):`
code reviewing part 2 2011-01-15 12:53:40 +00:00			`ContentHandler.__init__(self)`

minor update regarding boolean logic comparison mechanism 2012-03-30 09:42:58 +00:00			`self._dbms = None`
Some more optimization 2016-04-08 15:30:25 +02:00			`self._page = (page or "")`
Fixes #4096 2020-01-31 21:51:02 +01:00			`try:`
			`self._lower_page = self._page.lower()`
			`except SystemError: # https://bugs.python.org/issue18183`
			`self._lower_page = None`
Minor update regarding #3129 2018-06-01 10:21:59 +02:00			`self._urldecoded_page = urldecode(self._page)`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00
Minor code restyling 2011-04-30 13:20:05 +00:00			`self.dbms = None`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00
minor update regarding boolean logic comparison mechanism 2012-03-30 09:42:58 +00:00			`def _markAsErrorPage(self):`
			`threadData = getCurrentThreadData()`
			`threadData.lastErrorPage = (threadData.lastRequestUID, self._page)`

After the storm, a restore.. 2008-10-15 15:38:22 +00:00			`def startElement(self, name, attrs):`
Some more optimization 2016-04-08 15:30:25 +02:00			`if self.dbms:`
			`return`

After the storm, a restore.. 2008-10-15 15:38:22 +00:00			`if name == "dbms":`
minor update regarding boolean logic comparison mechanism 2012-03-30 09:42:58 +00:00			`self._dbms = attrs.get("value")`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00
minor refactoring/optimization 2011-11-16 16:06:21 +00:00			`elif name == "error":`
Some more optimization 2016-04-08 15:30:25 +02:00			`regexp = attrs.get("regexp")`
			`if regexp not in kb.cache.regex:`
If it works, don't touch. I touched 2017-10-31 11:38:09 +01:00			`keywords = re.findall(r"\w+", re.sub(r"\\.", " ", regexp))`
Some more optimization 2016-04-08 15:30:25 +02:00			`keywords = sorted(keywords, key=len)`
			`kb.cache.regex[regexp] = keywords[-1].lower()`

Minor patch 2020-02-27 14:31:43 +01:00			`if ('\|' in regexp or kb.cache.regex[regexp] in (self._lower_page or kb.cache.regex[regexp])) and re.search(regexp, self._urldecoded_page, re.I):`
minor update regarding boolean logic comparison mechanism 2012-03-30 09:42:58 +00:00			`self.dbms = self._dbms`
			`self._markAsErrorPage()`
Adding support for MemSQL (MySQL fork) 2020-01-20 23:11:37 +01:00			`kb.forkNote = kb.forkNote or attrs.get("fork")`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00
Minor code adjustments 2008-11-17 00:13:49 +00:00			`def htmlParser(page):`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00			`"""`
			`This function calls a class that parses the input HTML page to`
			`fingerprint the back-end database management system`
Minor update of testing 2020-01-03 13:46:12 +01:00
			`>>> from lib.core.enums import DBMS`
			`>>> htmlParser("Warning: mysql_fetch_array() expects parameter 1 to be resource") == DBMS.MYSQL`
			`True`
			`>>> threadData = getCurrentThreadData()`
			`>>> threadData.lastErrorPage = None`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00			`"""`

Another patch regarding #4530 2021-01-07 14:20:03 +01:00			`page = page[:HEURISTIC_PAGE_SIZE_THRESHOLD]`

Minor code adjustments 2008-11-17 00:13:49 +00:00			`xmlfile = paths.ERRORS_XML`
Minor style update (capitalization of leftover class names) 2012-12-06 13:46:24 +01:00			`handler = HTMLHandler(page)`
Speed optimization(s) 2016-09-09 11:06:38 +02:00			`key = hash(page)`

Patch for sporadic --parse-errors in generic SQL errors (e.g. CrateDB) 2020-02-02 22:01:57 +01:00			`# generic SQL warning/error messages`
			`if re.search(r"SQL (warning\|error\|syntax)", page, re.I):`
			`handler._markAsErrorPage()`

Speed optimization(s) 2016-09-09 11:06:38 +02:00			`if key in kb.cache.parsedDbms:`
			`retVal = kb.cache.parsedDbms[key]`
			`if retVal:`
			`handler._markAsErrorPage()`
			`return retVal`
minor optimization (only way to prematurely stop SAX parser) 2011-01-23 10:12:01 +00:00
revert 2011-01-23 11:21:27 +00:00			`parseXmlFile(xmlfile, handler)`
After the storm, a restore.. 2008-10-15 15:38:22 +00:00
Minor layout adjustments, minor fixes and updated changelog 2008-11-17 00:00:54 +00:00			`if handler.dbms and handler.dbms not in kb.htmlFp:`
fix for one of those more complex bugs (comparison was returning None while original page and/or page template were already had already DBMS error inside) 2010-12-24 12:13:48 +00:00			`kb.lastParserStatus = handler.dbms`
Minor layout adjustments, minor fixes and updated changelog 2008-11-17 00:00:54 +00:00			`kb.htmlFp.append(handler.dbms)`
fix for one of those more complex bugs (comparison was returning None while original page and/or page template were already had already DBMS error inside) 2010-12-24 12:13:48 +00:00			`else:`
			`kb.lastParserStatus = None`
Minor layout adjustments, minor fixes and updated changelog 2008-11-17 00:00:54 +00:00
Speed optimization(s) 2016-09-09 11:06:38 +02:00			`kb.cache.parsedDbms[key] = handler.dbms`

After the storm, a restore.. 2008-10-15 15:38:22 +00:00			`return handler.dbms`