Tuesday 19 June 2018

sax - How do I stop parsing an XML document with IVBSAXXMLReader in Delphi?




In order to quickly parse some large XML documents in a Delphi (2007) program, I have implemented the IVBSAXContentHandler interface and use it like this:



FXMLReader := CoSAXXMLReader60.Create;
FXMLReader.contentHandler := Self;
FXMLReader.parseURL(FXmlFile);


This works fine, as long as I simply parse the whole file, but I'd like to stop once I found the content that I am looking for. So my implementation of IVBSAXContentHandler.startElement checks for some condition and when it is true should abort further parsing. I tried this:




procedure TContentHandler.startElement(var strNamespaceURI, strLocalName,  strQName: WideString; const oAttributes: IVBSAXAttributes);
begin
if SomeCondition then
SysUtils.Abort;
end;


Unfortunately this raises the rather unhelpful EOleException "Catastrophic failure". (I also tried raising a custom exception with the same result.)



MSDN says the following:





The ErrorHandler interface essentially allows the XMLReader to signal the ContentHandler implementation that it wants to abort processing. Conversely, ContentHandler implementations can indicate to the XMLReader that it wants to abort processing. This can be accomplished by simply raising an application-specific exception. This is especially useful for aborting processing once the implementation finds what it is looking for:




Private Sub IVBSAXContentHandler_characters(ByVal strChars As String)
' I found what I was looking for, abort processing
Err.Raise vbObjectError + errDone, "startElement", _
"I got what I want, let's go play!"
End Sub



So, apparently somehow I also need to implement the IVBSAXErrorHandler interface somehow. This interface needs three methods:



procedure TContentHandler.error(const oLocator: IVBSAXLocator;
var strErrorMessage: WideString; nErrorCode: Integer);
begin

end;


procedure TContentHandler.fatalError(const oLocator: IVBSAXLocator;
var strErrorMessage: WideString; nErrorCode: Integer);
begin

end;

procedure TContentHandler.ignorableWarning(const oLocator: IVBSAXLocator;
var strErrorMessage: WideString; nErrorCode: Integer);
begin


end;


and also must be assigned before calling the ParseURL method:



FXMLReader := CoSAXXMLReader60.Create;
FXMLReader.contentHandler := Self;
FXMLReader.errorHandler := Self;
FXMLReader.parseURL(FXmlFile);



Unfortunately that doesn't make any difference, because now the fatalError handler gets called with strErrorMessage = 'Catastrophic failure'. With an empty method body this still results in the above mentioned unhelpful EOleException "Catastrophic failure".



So, now I am out of ideas:




  • Do I need to implement something special in the errorhandler interface?

  • Do I need to raise a particular exception instead of EAbort?

  • Or am I missing something else?







EDIT:



Based on Ondrej Kelle's answer, here is the solution I finally used:



Declare the following constant:



const

// idea taken from Delphi 10.1 unit System.Win.ComObj:
EExceptionRaisedHRESULT = HResult(E_UNEXPECTED or (1 shl 29)); // turn on customer bit


Add two new fields to the TContentHandler class:



FExceptObject: TObject;
FExceptAddr: Pointer;



Add this code to the destructor:



FreeAndNil(FExceptObject);


Add a new method SafeCallException:



function TContentHandler.SafeCallException(ExceptObject: TObject; ExceptAddr: Pointer): HResult;
var
GUID: TGUID;

exc: Exception;
begin
if ExceptObject is Exception then begin
exc := Exception(ExceptObject);
// Create a copy of the exception object and store it in the FExceptObject field
FExceptObject := exc.NewInstance;
Exception(FExceptObject).Create(exc.Message);
Exception(FExceptObject).HelpContext := exc.HelpContext;
// Store the exception address in the FExceptAddr field
FExceptAddr := ExceptAddr;

// return a custom HRESULT
Result := EExceptionRaisedHRESULT;
end else begin
ZeroMemory(@GUID, SizeOf(GUID));
Result := HandleSafeCallException(ExceptObject, ExceptAddr, GUID, '', '');
end;
end;


Add an exception handler to the calling code:




var
exc: Exception;
begin
try
FXMLReader := CoSAXXMLReader60.Create;
FXMLReader.contentHandler := Self;
// we do not need an errorHandler
FXMLReader.parseURL(FXmlFile);
FXMLReader := nil;

except
on e: EOleException do begin
// Check for the custom HRESULT
if e.ErrorCode = EExceptionRaisedHRESULT then begin
// Check that the exception object is assigned
if Assigned(FExceptObject) then begin
exc := Exception(FExceptObject);
// set the pointer to NIL
FExceptObject := nil;
// raise the exception a the given address

raise exc at FExceptAddr;
end;
end;
// fallback: raise the original exception
raise;
end;
end;
end;



While this works for me, it has a serious flaw: It copies only the Message and the HelpContext property of the original exception. So, if there are more properties/fields, e.g.



EInOutError = class(Exception)
public
ErrorCode: Integer;
end;


These will not be initialized when the exception is re-raised in the calling code.




The advantage is that you will get the correct exception address in the debugger. Beware that you won't get the correct call stack.


Answer



Simply calling Abort; is fine. In this case, just override SafeCallException in your IVBSAXContentHandler implementor class:



function TContentHandler.SafeCallException(ExceptObject: TObject; ExceptAddr: Pointer): HRESULT;
begin
Result := HandleSafeCallException(ExceptObject, ExceptAddr, TGUID.Empty, '', '');
end;



HandleSafeCallException supplied in ComObj will cause EAbort you're raising to be translated into a HRESULT value E_ABORT which will then be translated back to EAbort by SafeCallError.



Alternatively, you can raise your own exception class, override SafeCallException to translate it into your specific HRESULT value and replace SafeCallErrorProc with your own to translate it back into your Delphi exception which you can then handle on the calling side.


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...