Ticket #1446 (closed defect: fixed)
Unhandled exception with lxml installed
| Reported by: | Stuart | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 1.0 |
| Component: | Plugins | Keywords: | |
| Cc: | Version (eg. 1.0r2700): |
Description
I am running Flexget version 1.0r2672 and recently started getting a TV input plugin error. I am not exactly sure when this started happening but I have not made any config changes and it was working for months.
Any help/suggestions would be wonderful.
Here's a snip it from the config file:
tv:
inputs:
- rss:
url: http://static.demonoid.me/rss/3.xml
username: XXX
password: XXX
- rss:
url: http://static.demonoid.me/rss/13.xml
username: XXX
password: XXX
cookies: /home/XXX/.flexget/cookies.sqlite
exists_series:
- /archive/videos/Tv
- /archive/incoming
Here are the errors I get:
2012-01-13 10:38 ERROR feed TV_Series BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/feed.py", line 357, in __run_plugin
return method(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/inputs.py", line 35, in on_feed_input
result = method(feed, input_config)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/utils/cached_input.py", line 135, in wrapped_func
response = func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugin.py", line 116, in wrapped_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/rss.py", line 239, in on_feed_input
rss = feedparser.parse(content)
File "/usr/local/lib/python2.7/dist-packages/feedparser-5.1-py2.7.egg/feedparser.py", line 3888, in parse
saxparser.parse(source)
File "/usr/lib/python2.7/dist-packages/drv_libxml2.py", line 176, in parse
SAXException("Read failed (no details available)"))
File "/usr/local/lib/python2.7/dist-packages/feedparser-5.1-py2.7.egg/feedparser.py", line 1828, in fatalError
raise exc
SAXException: Read failed (no details available)
2012-01-13 10:38 INFO feed TV_Series Aborting feed (plugin: inputs)
2012-01-13 10:38 VERBOSE verbose About undecided entries: They were created by input plugins but were not accepted because no (filter) plugin accepted them. If you want them to reach output, configure filters.
Attachments
Change History
comment:2 Changed 16 months ago by gray
I just got the same error after installing the lxml package. The problem disappeared after I uninstalled lxml.
comment:3 Changed 15 months ago by bluephoenix47
I get the same error. Seems to be related to requests, per tinkering and discussion with gazpachoking in IRC. See #1510 and #1342, which I'm also getting after a recent update.
I just upgraded to 1.0r2710. I tried requests 0.10.0 and 0.10.2; I get the errors with both. I'm running Arch Linux. I do not have lxml installed.
Correction: I did have lxml (though my initial search through my packages didn't find it, for some reason). And uninstalling it DOES fix all my problems. So, that solves a problem, but I need lxml for another program I regularly use (calibre). So why does lxml kill requests?
comment:4 Changed 15 months ago by bluephoenix47
I've managed to get some strange behavior when testing this. I've isolated the error to one or more of three inputs:
- rss: http://www.torlock.com/movies/rss.xml # TorLock (Movies)
- rss: http://torrentz.eu/feed_verified?q=movies # Torrentz (Verified Movies)
- rss: http://rss.thepiratebay.org/207
However, when I comment out any of them, and run flexget --test, they all passed.
More strange, when I leave them all in, and run flexget --test --feed=My_Movies-1080p,My_Movies-720p,Auto_Movies, they all pass. When run without specifying --feed=, several other feeds run first, processing some TV show rss input.
Normally when I run flexget --test with all the rss input present, I get:
2012-02-21 10:56 ERROR feed My_Movies-1080p BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/flexget/feed.py", line 357, in __run_plugin
return method(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/plugins/input/inputs.py", line 35, in on_feed_input
result = method(feed, input_config)
File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/utils/cached_input.py", line 135, in wrapped_func
response = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/plugin.py", line 116, in wrapped_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/plugins/input/rss.py", line 231, in on_feed_input
rss = feedparser.parse(content)
File "/usr/lib/python2.7/site-packages/feedparser.py", line 3888, in parse
saxparser.parse(source)
File "/usr/lib/python2.7/site-packages/drv_libxml2.py", line 176, in parse
SAXException("Read failed (no details available)"))
File "/usr/lib/python2.7/site-packages/feedparser.py", line 1828, in fatalError
raise exc
SAXException: Read failed (no details available)
2012-02-21 10:56 INFO feed My_Movies-1080p Aborting feed (plugin: inputs)
2012-02-21 10:56 VERBOSE input_cache My_Movies-720p Restored 50 entries from cache
2012-02-21 10:56 VERBOSE input_cache My_Movies-720p Restored 50 entries from cache
2012-02-21 10:56 ERROR feed My_Movies-720p BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/flexget/feed.py", line 357, in __run_plugin
return method(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/plugins/input/inputs.py", line 35, in on_feed_input
result = method(feed, input_config)
File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/utils/cached_input.py", line 135, in wrapped_func
response = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/plugin.py", line 116, in wrapped_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/plugins/input/rss.py", line 231, in on_feed_input
rss = feedparser.parse(content)
File "/usr/lib/python2.7/site-packages/feedparser.py", line 3888, in parse
saxparser.parse(source)
File "/usr/lib/python2.7/site-packages/drv_libxml2.py", line 176, in parse
SAXException("Read failed (no details available)"))
File "/usr/lib/python2.7/site-packages/feedparser.py", line 1828, in fatalError
raise exc
SAXException: Read failed (no details available)
2012-02-21 10:56 INFO feed My_Movies-720p Aborting feed (plugin: inputs)
2012-02-21 10:56 VERBOSE input_cache Auto_movies Restored 50 entries from cache
2012-02-21 10:56 VERBOSE input_cache Auto_movies Restored 50 entries from cache
2012-02-21 10:56 ERROR feed Auto_movies BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/flexget/feed.py", line 357, in __run_plugin
return method(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/plugins/input/inputs.py", line 35, in on_feed_input
result = method(feed, input_config)
File "/usr/lib/python2.7/site-packages/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/utils/cached_input.py", line 135, in wrapped_func
response = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/plugin.py", line 116, in wrapped_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/flexget/plugins/input/rss.py", line 231, in on_feed_input
rss = feedparser.parse(content)
File "/usr/lib/python2.7/site-packages/feedparser.py", line 3888, in parse
saxparser.parse(source)
File "/usr/lib/python2.7/site-packages/drv_libxml2.py", line 176, in parse
SAXException("Read failed (no details available)"))
File "/usr/lib/python2.7/site-packages/feedparser.py", line 1828, in fatalError
raise exc
SAXException: Read failed (no details available)
I can make this bug go away by modifying the feedparser PREFERRED_XML_PARSERS = [], or uninstalling python-lxml, but that's not a satisfying solution. I'd really like to reproduce this bug, so I can get more information to the feedparser devs (who I'm currently working with on this).
comment:5 Changed 15 months ago by lazybones
inputs:
- rss: http://www.ezrss.it/feed/
- rss: http://www.torlock.com/television/rss.xml
- rss: http://rss.bt-chat.com/?cat=9
2012-02-28 08:00 ERROR feed HR_Doc_RSS BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/feed.py", line 355, in __run_plugin
return method(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/inputs.py", line 35, in on_feed_input
result = method(feed, input_config)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/utils/cached_input.py", line 135, in wrapped_func
response = func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugin.py", line 116, in wrapped_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/rss.py", line 234, in on_feed_input
rss = feedparser.parse(content)
File "build/bdist.linux-x86_64/egg/feedparser.py", line 3888, in parse
saxparser.parse(source)
File "/usr/lib/python2.7/dist-packages/drv_libxml2.py", line 176, in parse
SAXException("Read failed (no details available)"))
File "build/bdist.linux-x86_64/egg/feedparser.py", line 1828, in fatalError
raise exc
SAXException: Read failed (no details available)
2012-02-28 08:00 INFO feed HR_Doc_RSS Aborting feed (plugin: inputs)
Most recent dump from an invalid feed response for that set of feeds
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>503 - Service Unavailable</title>
<style type="text/css">
body { font-size: 62.5%; }
#container {
font-size: 62.5%;
max-width: 600px;
margin: auto;
margin-top: 2%;
border: 4px solid #efefef;
padding: 0px 20px;
color: #444;
font-family: Verdana,helvetica,sans-serif;
font-size: 1.25em;
}
h1 { color: #6D84B4; font-size: 1.5em; }
#footer { text-align: right; margin-top: 25px; }
</style>
</head>
<body>
<div id="container">
<h1>Error 503 - Service Unavailable</h1>
<p>The server is temporarily unable to service your request due to maintenance downtime or capacity problems.<br>Please try again later.</p> <p id="footer">lighttpd/2.0.0</p>
</div>
</body>
</html>
comment:6 Changed 15 months ago by lazybones
2012-03-09 21:22 ERROR feed HR_Doc_RSS BUG: Unhandled error in plugin inputs: Read failed (no details available)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/feed.py", line 355, in __run_plugin
return method(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/inputs.py", line 35, in on_feed_input
result = method(feed, input_config)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/event.py", line 20, in __call__
return self.func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/utils/cached_input.py", line 135, in wrapped_func
response = func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugin.py", line 116, in wrapped_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/FlexGet-1.0-py2.7.egg/flexget/plugins/input/rss.py", line 235, in on_feed_input
rss = feedparser.parse(content)
File "build/bdist.linux-x86_64/egg/feedparser.py", line 3888, in parse
saxparser.parse(source)
File "/usr/lib/python2.7/dist-packages/drv_libxml2.py", line 176, in parse
SAXException("Read failed (no details available)"))
File "build/bdist.linux-x86_64/egg/feedparser.py", line 1828, in fatalError
raise exc
SAXException: Read failed (no details available)
comment:7 Changed 15 months ago by flexget
the ticket is still open with no mention of it being solved, so what is the point of posting the same error message?
comment:8 Changed 15 months ago by lazybones
Saw the error, did a search, glanced at the original error and thought it was different (some line numbers), missed the fact I had posted my version before
Probably time to get some sleep... Sorry.
comment:9 Changed 13 months ago by kurtmckee
Howdy, I'm the maintainer for feedparser. This issue is being investigated on the feedparser issue tracker at:
comment:10 Changed 12 months ago by kurtmckee
I've attempted to recreate the problem on my machine and I'm unable to do so.
Stuart, lazybones, gray, and bluephoenix47, please post the simplest config files that result in a crash. I need to be able to copy and paste the configuration, so if you have to XXX usernames and passwords, please remove the entry completely.
For reference, I've tried using the following:
feeds:
test feed:
rss: http://www.ezrss.it/feed/
rss: http://www.torlock.com/television/rss.xml
rss: http://rss.bt-chat.com/?cat=9
but the output I get from flexget is:
2012-05-16 11:01 INFO manager Running database cleanup. 2012-05-16 11:01 VERBOSE details test feed Produced 40 entries. 2012-05-16 11:01 WARNING feed test feed Feed doesn't have any filter plugins, you should add (at least) one! 2012-05-16 11:01 VERBOSE details test feed Summary - Accepted: 0 (Rejected: 0 Undecided: 40 Failed: 0) 2012-05-16 11:01 WARNING feed test feed Feed doesn't have any output plugins, you should add (at least) one!
I'm using the following software versions:
Python 2.7.2 FlexGet 1.0r2880 lxml 2.3.4 libxml2 2.7.8 libxslt 1.1.26 feedparser 5.1.2
comment:11 Changed 12 months ago by kurtmckee
Egg on my face. My configuration file was invalid. I'm able to reproduce the error using the following:
feeds:
test1:
rss: http://www.ezrss.it/feed/
accept_all: yes
download: /home/kurt/tmp/flexget-output/
test2:
rss: http://www.torlock.com/television/rss.xml
accept_all: yes
download: /home/kurt/tmp/flexget-output/
test3:
rss: http://rss.bt-chat.com/?cat=9
accept_all: yes
download: /home/kurt/tmp/flexget-output/
comment:12 Changed 12 months ago by kurtmckee
Wow, debugging this sucks. After the first run with the config file above it took, like, 20 more tries to get the thing to crash again. It turns out that bt-chat.com was very occasionally returning an HTML page instead of an RSS page. I used pdb to identify where in the document the XML parser was crashing and discovered it was due to an unescaped ampersand in its Javascript. I've whittled down the config file and the HTML page to its simplest elements, and will attach both the config file and the crasher document after posting this. There are some real oddities to this:
- The HTML page must be parsed after some other feed. If it's run first, it won't crash.
- The first document must be a valid feed. If it returns a 404, for instance, the HTML page won't crash.
This problem can be handled in feedparser by catching SAXException, rather than SAXParseException (SAXParseException is a subclass of SAXException anyway)...which is weird, because libxml2 normally would throw a SAXParseException when encountering an unescaped ampersand! That, and the conditional nature of this problem suggests that there is a threading-related issue, or that some state isn't being cleaned up somewhere, et cetera. Figuring out why this is happening conditionally will make sure that whatever code is actually causing the problem gets fixed. Additionally, I want to have a unit test that demonstrates that this problem is fixed. *grin*
Changed 12 months ago by kurtmckee
- Attachment btchat.html added
a "feed" that, when processed second, causes the XML parser to throw a SAXException
comment:13 Changed 12 months ago by gazpachoking
Nice work tracking that down. Appreciate the help on this. :)
comment:14 Changed 12 months ago by kurtmckee
Good news, everyone! This isn't a flexget or feedparser issue: the problem is that lxml.etree is somehow modifying the exception that gets thrown by the libxml2 parser. Typically the invalid character reference would throw SAXParseException, but if lxml.etree is imported that changes to SAXException. I'm going to upload a standalone script that demonstrates the issue after posting this comment.
My guess is that there is a code path that gets run after the first feed is parsed by feedparser that imports lxml.etree, which is why the HTML page couldn't be the first thing processed by flexget.
I won't have an opportunity before heading to work, but later today I'll file a bug report with lxml and come back to link this ticket to it. After that I'll try writing up a feedparser unit test that demonstrates the bug and fix the issue at the feedparser level.
Changed 12 months ago by kurtmckee
- Attachment lxml-etree-changes-exceptions.py added
demonstrate that merely importing lxml.etree causes exceptions to change
comment:15 Changed 12 months ago by kurtmckee
I've filed a ticket at the lxml issue tracker:
comment:16 Changed 12 months ago by kurtmckee
This is fixed in feedparser revision 710. It will be included in the next release.
comment:17 Changed 12 months ago by gazpachoking
Thanks a lot kurtmckee! We'll probably upgrade our requirements once the new release comes out to make sure pip updates feedparser to the fixed version.
comment:18 Changed 7 months ago by flexget
Because new feedparser isn't still yet released with this fix this is how to upgrade to latest git version with fix:
bin/pip install --upgrade git+https://code.google.com/p/feedparser/
Maybe flexget can include that somehow in the version requirements in pavement.py
comment:19 Changed 6 months ago by gazpachoking
- Summary changed from Unhandled error in plugin inputs to Unhandled exception with lxml installed
comment:20 Changed 13 days ago by paranoidi
- Status changed from new to closed
- Resolution set to fixed
Hopefully this is now fixed in the dependencies. Thanks for everyone helping.

Some of your RSS feeds is returning invalid data, the input RSS plugin should handle that error (SAXException) better.