Opened 3 years ago

Closed 18 months ago

#1446 closed defect (fixed)

Unhandled exception with lxml installed

Reported by: Stuart Owned by:
Priority: major Milestone: 1.0
Component: Plugins Keywords:
Cc: Version (eg. 1.0r2700):

Description

I am running Flexget version 1.0r2672 and recently started getting a TV input plugin error. I am not exactly sure when this started happening but I have not made any config changes and it was working for months.

Any help/suggestions would be wonderful.

Here's a snip it from the config file:

  tv:
    inputs:
      - rss:
          url: http://static.demonoid.me/rss/3.xml
          username: XXX
          password: XXX
      - rss:
          url: http://static.demonoid.me/rss/13.xml
          username: XXX
          password: XXX
    cookies: /home/XXX/.flexget/cookies.sqlite
    exists_series:
      - /archive/videos/Tv
      - /archive/incoming

Here are the errors I get:

2012-01-13 10:38 ERROR    feed          TV_Series       BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/feed.py", line 357, in __run_plugin
    return method(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/inputs.py", line 35, in on_feed_input
    result = method(feed, input_config)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/utils/cached_input.py", line 135, in wrapped_func
    response = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugin.py", line 116, in wrapped_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/rss.py", line 239, in on_feed_input
    rss = feedparser.parse(content)
  File "/usr/local/lib/python2.7/dist-packages/feedparser-5.1-py2.7.egg/feedparser.py", line 3888, in parse
    saxparser.parse(source)
  File "/usr/lib/python2.7/dist-packages/drv_libxml2.py", line 176, in parse
    SAXException("Read failed (no details available)"))
  File "/usr/local/lib/python2.7/dist-packages/feedparser-5.1-py2.7.egg/feedparser.py", line 1828, in fatalError
    raise exc
SAXException: Read failed (no details available)
2012-01-13 10:38 INFO     feed          TV_Series       Aborting feed (plugin: inputs)
2012-01-13 10:38 VERBOSE  verbose                       About undecided entries: They were created by input plugins but were not accepted because no (filter) plugin accepted them. If you want them to reach output, configure filters.

Attachments (3)

config.yml (339 bytes) - added by kurtmckee 2 years ago.
config.yml that triggers the bug
btchat.html (50 bytes) - added by kurtmckee 2 years ago.
a "feed" that, when processed second, causes the XML parser to throw a SAXException
lxml-etree-changes-exceptions.py (1010 bytes) - added by kurtmckee 2 years ago.
demonstrate that merely importing lxml.etree causes exceptions to change

Download all attachments as: .zip

Change History (23)

comment:1 Changed 3 years ago by paranoidi

Some of your RSS feeds is returning invalid data, the input RSS plugin should handle that error (SAXException) better.

comment:2 Changed 3 years ago by gray

I just got the same error after installing the lxml package. The problem disappeared after I uninstalled lxml.

comment:3 Changed 3 years ago by bluephoenix47

I get the same error. Seems to be related to requests, per tinkering and discussion with gazpachoking in IRC. See #1510 and #1342, which I'm also getting after a recent update.

I just upgraded to 1.0r2710. I tried requests 0.10.0 and 0.10.2; I get the errors with both. I'm running Arch Linux. I do not have lxml installed.


Correction: I did have lxml (though my initial search through my packages didn't find it, for some reason). And uninstalling it DOES fix all my problems. So, that solves a problem, but I need lxml for another program I regularly use (calibre). So why does lxml kill requests?

Last edited 3 years ago by bluephoenix47 (previous) (diff)

comment:4 Changed 3 years ago by bluephoenix47

I've managed to get some strange behavior when testing this. I've isolated the error to one or more of three inputs:

      - rss: http://www.torlock.com/movies/rss.xml      # TorLock (Movies)
      - rss: http://torrentz.eu/feed_verified?q=movies  # Torrentz (Verified Movies)
      - rss: http://rss.thepiratebay.org/207   

However, when I comment out any of them, and run flexget --test, they all passed.

More strange, when I leave them all in, and run flexget --test --feed=My_Movies-1080p,My_Movies-720p,Auto_Movies, they all pass. When run without specifying --feed=, several other feeds run first, processing some TV show rss input.

Normally when I run flexget --test with all the rss input present, I get:

2012-02-21 10:56 ERROR    feed          My_Movies-1080p BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/flexget/feed.py", line 357, in __run_plugin
    return method(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/plugins/input/inputs.py", line 35, in on_feed_input
    result = method(feed, input_config)
  File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/utils/cached_input.py", line 135, in wrapped_func
    response = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/plugin.py", line 116, in wrapped_func
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/plugins/input/rss.py", line 231, in on_feed_input
    rss = feedparser.parse(content)
  File "/usr/lib/python2.7/site-packages/feedparser.py", line 3888, in parse
    saxparser.parse(source)
  File "/usr/lib/python2.7/site-packages/drv_libxml2.py", line 176, in parse
    SAXException("Read failed (no details available)"))
  File "/usr/lib/python2.7/site-packages/feedparser.py", line 1828, in fatalError
    raise exc
SAXException: Read failed (no details available)
2012-02-21 10:56 INFO     feed          My_Movies-1080p Aborting feed (plugin: inputs)
2012-02-21 10:56 VERBOSE  input_cache   My_Movies-720p  Restored 50 entries from cache
2012-02-21 10:56 VERBOSE  input_cache   My_Movies-720p  Restored 50 entries from cache
2012-02-21 10:56 ERROR    feed          My_Movies-720p  BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/flexget/feed.py", line 357, in __run_plugin
    return method(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/plugins/input/inputs.py", line 35, in on_feed_input
    result = method(feed, input_config)
  File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/utils/cached_input.py", line 135, in wrapped_func
    response = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/plugin.py", line 116, in wrapped_func
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/plugins/input/rss.py", line 231, in on_feed_input
    rss = feedparser.parse(content)
  File "/usr/lib/python2.7/site-packages/feedparser.py", line 3888, in parse
    saxparser.parse(source)
  File "/usr/lib/python2.7/site-packages/drv_libxml2.py", line 176, in parse
    SAXException("Read failed (no details available)"))
  File "/usr/lib/python2.7/site-packages/feedparser.py", line 1828, in fatalError
    raise exc
SAXException: Read failed (no details available)
2012-02-21 10:56 INFO     feed          My_Movies-720p  Aborting feed (plugin: inputs)
2012-02-21 10:56 VERBOSE  input_cache   Auto_movies     Restored 50 entries from cache
2012-02-21 10:56 VERBOSE  input_cache   Auto_movies     Restored 50 entries from cache
2012-02-21 10:56 ERROR    feed          Auto_movies     BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/flexget/feed.py", line 357, in __run_plugin
    return method(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/plugins/input/inputs.py", line 35, in on_feed_input
    result = method(feed, input_config)
  File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/utils/cached_input.py", line 135, in wrapped_func
    response = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/plugin.py", line 116, in wrapped_func
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/flexget/plugins/input/rss.py", line 231, in on_feed_input
    rss = feedparser.parse(content)
  File "/usr/lib/python2.7/site-packages/feedparser.py", line 3888, in parse
    saxparser.parse(source)
  File "/usr/lib/python2.7/site-packages/drv_libxml2.py", line 176, in parse
    SAXException("Read failed (no details available)"))
  File "/usr/lib/python2.7/site-packages/feedparser.py", line 1828, in fatalError
    raise exc
SAXException: Read failed (no details available)

I can make this bug go away by modifying the feedparser PREFERRED_XML_PARSERS = [], or uninstalling python-lxml, but that's not a satisfying solution. I'd really like to reproduce this bug, so I can get more information to the feedparser devs (who I'm currently working with on this).

Last edited 3 years ago by bluephoenix47 (previous) (diff)

comment:5 Changed 3 years ago by lazybones

    inputs:      
      - rss: http://www.ezrss.it/feed/
      - rss: http://www.torlock.com/television/rss.xml
      - rss: http://rss.bt-chat.com/?cat=9
2012-02-28 08:00 ERROR    feed          HR_Doc_RSS      BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/feed.py", line 355, in __run_plugin
    return method(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/inputs.py", line 35, in on_feed_input
    result = method(feed, input_config)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/utils/cached_input.py", line 135, in wrapped_func
    response = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugin.py", line 116, in wrapped_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/rss.py", line 234, in on_feed_input
    rss = feedparser.parse(content)
  File "build/bdist.linux-x86_64/egg/feedparser.py", line 3888, in parse
    saxparser.parse(source)
  File "/usr/lib/python2.7/dist-packages/drv_libxml2.py", line 176, in parse
    SAXException("Read failed (no details available)"))
  File "build/bdist.linux-x86_64/egg/feedparser.py", line 1828, in fatalError
    raise exc
SAXException: Read failed (no details available)
2012-02-28 08:00 INFO     feed          HR_Doc_RSS      Aborting feed (plugin: inputs)

Most recent dump from an invalid feed response for that set of feeds

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
	<head>
		<title>503 - Service Unavailable</title>
		<style type="text/css">
			body { font-size: 62.5%; }
			#container {
				font-size: 62.5%;
				max-width: 600px;
				margin: auto;
				margin-top: 2%;
				border: 4px solid #efefef;
				padding: 0px 20px;
				color: #444;
				font-family: Verdana,helvetica,sans-serif;
				font-size: 1.25em;
			}
			h1 { color: #6D84B4; font-size: 1.5em; }
			#footer { text-align: right; margin-top: 25px; }
		</style>
	</head>
	<body>
		<div id="container">
			<h1>Error 503 - Service Unavailable</h1>
<p>The server is temporarily unable to service your request due to maintenance downtime or capacity problems.<br>Please try again later.</p>			<p id="footer">lighttpd/2.0.0</p>
		</div>
	</body>
</html>

comment:6 Changed 3 years ago by lazybones

r2791

2012-03-09 21:22 ERROR    feed          HR_Doc_RSS      BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/feed.py", line 355, in __run_plugin
    return method(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/inputs.py", line 35, in on_feed_input
    result = method(feed, input_config)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/utils/cached_input.py", line 135, in wrapped_func
    response = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugin.py", line 116, in wrapped_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/rss.py", line 235, in on_feed_input
    rss = feedparser.parse(content)
  File "build/bdist.linux-x86_64/egg/feedparser.py", line 3888, in parse
    saxparser.parse(source)
  File "/usr/lib/python2.7/dist-packages/drv_libxml2.py", line 176, in parse
    SAXException("Read failed (no details available)"))
  File "build/bdist.linux-x86_64/egg/feedparser.py", line 1828, in fatalError
    raise exc
SAXException: Read failed (no details available)

comment:7 Changed 3 years ago by flexget

the ticket is still open with no mention of it being solved, so what is the point of posting the same error message?

comment:8 Changed 3 years ago by lazybones

Saw the error, did a search, glanced at the original error and thought it was different (some line numbers), missed the fact I had posted my version before

Probably time to get some sleep... Sorry.

comment:9 Changed 2 years ago by kurtmckee

Howdy, I'm the maintainer for feedparser. This issue is being investigated on the feedparser issue tracker at:

https://code.google.com/p/feedparser/issues/detail?id=352

comment:10 Changed 2 years ago by kurtmckee

I've attempted to recreate the problem on my machine and I'm unable to do so.

Stuart, lazybones, gray, and bluephoenix47, please post the simplest config files that result in a crash. I need to be able to copy and paste the configuration, so if you have to XXX usernames and passwords, please remove the entry completely.

For reference, I've tried using the following:

feeds:
    test feed:
        rss: http://www.ezrss.it/feed/
        rss: http://www.torlock.com/television/rss.xml
        rss: http://rss.bt-chat.com/?cat=9

but the output I get from flexget is:

2012-05-16 11:01 INFO     manager                       Running database cleanup.
2012-05-16 11:01 VERBOSE  details       test feed       Produced 40 entries.
2012-05-16 11:01 WARNING  feed          test feed       Feed doesn't have any filter plugins, you should add (at least) one!
2012-05-16 11:01 VERBOSE  details       test feed       Summary - Accepted: 0 (Rejected: 0 Undecided: 40 Failed: 0)
2012-05-16 11:01 WARNING  feed          test feed       Feed doesn't have any output plugins, you should add (at least) one!

I'm using the following software versions:

Python 2.7.2
FlexGet 1.0r2880
lxml 2.3.4
libxml2 2.7.8
libxslt 1.1.26
feedparser 5.1.2
Last edited 2 years ago by kurtmckee (previous) (diff)

comment:11 Changed 2 years ago by kurtmckee

Egg on my face. My configuration file was invalid. I'm able to reproduce the error using the following:

feeds:
    test1:
        rss: http://www.ezrss.it/feed/
        accept_all: yes
        download: /home/kurt/tmp/flexget-output/
    test2:
        rss: http://www.torlock.com/television/rss.xml
        accept_all: yes
        download: /home/kurt/tmp/flexget-output/
    test3:
        rss: http://rss.bt-chat.com/?cat=9
        accept_all: yes
        download: /home/kurt/tmp/flexget-output/

comment:12 Changed 2 years ago by kurtmckee

Wow, debugging this sucks. After the first run with the config file above it took, like, 20 more tries to get the thing to crash again. It turns out that bt-chat.com was very occasionally returning an HTML page instead of an RSS page. I used pdb to identify where in the document the XML parser was crashing and discovered it was due to an unescaped ampersand in its Javascript. I've whittled down the config file and the HTML page to its simplest elements, and will attach both the config file and the crasher document after posting this. There are some real oddities to this:

  1. The HTML page must be parsed after some other feed. If it's run first, it won't crash.
  2. The first document must be a valid feed. If it returns a 404, for instance, the HTML page won't crash.

This problem can be handled in feedparser by catching SAXException, rather than SAXParseException (SAXParseException is a subclass of SAXException anyway)...which is weird, because libxml2 normally would throw a SAXParseException when encountering an unescaped ampersand! That, and the conditional nature of this problem suggests that there is a threading-related issue, or that some state isn't being cleaned up somewhere, et cetera. Figuring out why this is happening conditionally will make sure that whatever code is actually causing the problem gets fixed. Additionally, I want to have a unit test that demonstrates that this problem is fixed. *grin*

Last edited 2 years ago by kurtmckee (previous) (diff)

Changed 2 years ago by kurtmckee

config.yml that triggers the bug

Changed 2 years ago by kurtmckee

a "feed" that, when processed second, causes the XML parser to throw a SAXException

comment:13 Changed 2 years ago by gazpachoking

Nice work tracking that down. Appreciate the help on this. :)

comment:14 Changed 2 years ago by kurtmckee

Good news, everyone! This isn't a flexget or feedparser issue: the problem is that lxml.etree is somehow modifying the exception that gets thrown by the libxml2 parser. Typically the invalid character reference would throw SAXParseException, but if lxml.etree is imported that changes to SAXException. I'm going to upload a standalone script that demonstrates the issue after posting this comment.

My guess is that there is a code path that gets run after the first feed is parsed by feedparser that imports lxml.etree, which is why the HTML page couldn't be the first thing processed by flexget.

I won't have an opportunity before heading to work, but later today I'll file a bug report with lxml and come back to link this ticket to it. After that I'll try writing up a feedparser unit test that demonstrates the bug and fix the issue at the feedparser level.

Last edited 2 years ago by kurtmckee (previous) (diff)

Changed 2 years ago by kurtmckee

demonstrate that merely importing lxml.etree causes exceptions to change

comment:15 Changed 2 years ago by kurtmckee

I've filed a ticket at the lxml issue tracker:

https://bugs.launchpad.net/lxml/+bug/1001301

comment:16 Changed 2 years ago by kurtmckee

This is fixed in feedparser revision 710. It will be included in the next release.

comment:17 Changed 2 years ago by gazpachoking

Thanks a lot kurtmckee! We'll probably upgrade our requirements once the new release comes out to make sure pip updates feedparser to the fixed version.

Last edited 2 years ago by gazpachoking (previous) (diff)

comment:18 Changed 2 years ago by flexget

Because new feedparser isn't still yet released with this fix this is how to upgrade to latest git version with fix:

bin/pip install --upgrade git+https://code.google.com/p/feedparser/

Maybe flexget can include that somehow in the version requirements in pavement.py

comment:19 Changed 2 years ago by gazpachoking

  • Summary changed from Unhandled error in plugin inputs to Unhandled exception with lxml installed

comment:20 Changed 18 months ago by paranoidi

  • Resolution set to fixed
  • Status changed from new to closed

Hopefully this is now fixed in the dependencies. Thanks for everyone helping.

Note: See TracTickets for help on using tickets.