org.apache.commons.feedparser.AtomFeedParser |
Line |
what if there is no type attribute specified? Whats the default? |
145 |
get xml:base to expand the URIs. |
283 |
move this code to MetaFeedParser... |
343 |
org.apache.commons.feedparser.BaseParser |
Line |
unify this with RSSFeedParser.getChildElementTextByName |
103 |
this can be rewritten to use getChild() |
134 |
org.apache.commons.feedparser.ContentDetector |
Line |
look for the RDF namespace and the RSS DTD namespace |
111 |
org.apache.commons.feedparser.FeedFilter |
Line |
return an object here so that I can flag a bozo bit. |
74 |
this isn't actually true. We should leave the BOM and remove the prolog anyway due to the fact that this will still break the parser. Come up with some tests for UTF-16 to see if I can get it to break and then update this method. |
105 |
note that when I was benchmarking this code that this showed up as a MAJOR bottleneck so we might want to optimize it a little more. |
185 |
org.apache.commons.feedparser.FeedParserImpl |
Line |
when this is a JDOM or XML parser Exception we should detect when we're working with an XHTML or HTML file and then parse it with an XFN/XOXO event listener. |
81 |
if we return the WRONG content type here we will break. getBytes()... UTF-16 and UTF-32 especially. We should also perform HTTP Content-Type parsing here to preserve the content type. This can be fixed by integrating our networking API from NewsMonster. |
98 |
if this is XHTML we need to handle this with either an XFN or an XOXO directory parser. There might be more metadata we need to parse here. (also I wonder if this could be a chance to do autodiscovery). |
172 |
if this is an UNKNOWN format We need to throw an UnsupportedFeedxception (which extends FeedParserException) |
179 |
org.apache.commons.feedparser.HTMLFeedParser |
Line |
only convert to using XFN if these types of links are detected. If its just a plain XHTML file then we shouldn't use this interface. Also FeedVersion needs to be called. |
44 |
only include onItem when we have at least ONE XFN relations that valid. |
73 |
when this current rel is NOT part of any XFN spec we should not be using the feed parser listener because it might just be a nofollow link or such. |
84 |
org.apache.commons.feedparser.MetaFeedParser |
Line |
this should be refactored into a new class called MetaFeedParser to be used by both Atom and RSS. Also the date handling below needs to be generic. |
40 |
make sure RSS .9 is working and 0.91. I just need to confirm but I think they are working correctly |
51 |
org.apache.commons.feedparser.RSSFeedParser |
Line |
migrate this to XPath |
151 |
if this is a GUID and isPermalink=false don't use it as the permalink. |
157 |
move to the onContent API defined within the AtomFeedParser and deprecated this body handling. |
208 |
with malformed XML this could throw an NPE. Luckly this format is rare now. |
222 |
move to the onContent API defined within the AtomFeedParser and deprecated this body handling. |
230 |
move to the onContent API defined within the AtomFeedParser and deprecated this body handling. |
254 |
org.apache.commons.feedparser.locate.AnchorParser |
Line |
we do NOT obey base right now and this is a BIG problem! |
33 |
what if there are HTML comments here? We would parse links within comments which isn't what we want. |
53 |
how do we pass back the content of the href? |
56 |
we SHOULD be using this but its not working right now. |
78 |
won't work with single quotes |
118 |
won't work with <a /> parse( "<a href=\"http://peerfear.org\" rel=\"linux\" title=\"linux\" >adf</a>", listener ); |
119 |
org.apache.commons.feedparser.locate.AnchorParserListener |
Line |
Pass a fourth attribute that is the body of the anchor here. |
39 |
org.apache.commons.feedparser.locate.EntityDecoder |
Line |
see FeedFilter.java for a list of all valid HTML entities. I should replace them with character literals in this situation. |
35 |
there are a LOT more of these and we need an exhaustive colleciton. |
44 |
|
51 |
(performance): do I have existing code that does this more efficiently? |
67 |
org.apache.commons.feedparser.locate.FeedLocator |
Line |
if we were GIVEN an RSS/Atom/OPML/etc file then we should just attempt to use this and return a FeedList with just one entry. Parse it first I think to make sure its valid XML and then move forward. The downside here is that it would be wasted CPU if its HTML content. |
81 |
add UNIT TESTS for Yahoo Groups and Flickr |
116 |
org.apache.commons.feedparser.locate.LinkLocator |
Line |
if it's at the same directory level we should prioritize it. for example: |
80 |
What happens if the Feed Parser is used to aggregate feeds on the localhost? This will break that. Brad Neuberg, bkn3@columbia.edu |
97 |
we should assert tha that these feeds are from the SAME domain not a link to another feed. |
109 |
This is a hack, Brad Neuberg, bkn3@columbia.edu |
210 |
org.apache.commons.feedparser.locate.ProbeLocator |
Line |
This doesn't seem like the right place for this. Can you document this more? It's cryptic. Brad Neuberg, bkn3@columbia.edu. |
155 |
org.apache.commons.feedparser.locate.ResourceExpander |
Line |
What happens if resource is a "file://" scheme? |
81 |
Brad says this method is totally broken. |
265 |
org.apache.commons.feedparser.locate.TestAnchorParser |
Line |
this won't work because it has an image |
82 |
what about unit tests which have multiple lines ? |
85 |
don't find anchors in comments. doTest( 0, "file:tests/anchor/anchor6.html" ); |
97 |
won't work with <a /> |
104 |
org.apache.commons.feedparser.locate.blogservice.Blosxom |
Line |
This might be fragile, but it is used across all of the Blosxom blogs I have looked at so far. Brad Neuberg, bkn3@columbia.edu |
69 |
org.apache.commons.feedparser.locate.blogservice.ExpressionEngine |
Line |
No way to detect this type of weblog right now |
56 |
Implement |
77 |
org.apache.commons.feedparser.locate.blogservice.Manila |
Line |
No way to detect this type of weblog right now |
56 |
org.apache.commons.feedparser.locate.blogservice.iBlog |
Line |
No way to detect this type of weblog right now |
56 |
org.apache.commons.feedparser.network.BaseResourceRequest |
Line |
this needs to use the cache. |
230 |
org.apache.commons.feedparser.network.NetworkException |
Line |
java.lang.NumberFormatException: For input string: "fie" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:468) at java.lang.Integer.parseInt(Integer.java:518) at org.peerfear.newsmonster.network.NetworkException.getResponseCode(NetworkException.java:142) at ksa.robot.FeedTask._doTaskLogFailure(FeedTask.java:264) at ksa.robot.FeedTask.run(FeedTask.java:202) at ksa.robot.TaskThread.doProcessTask(TaskThread.java:298) at ksa.robot.TaskThread.run(TaskThread.java:111) |
117 |
org.apache.commons.feedparser.network.ResourceRequestFactory |
Line |
(should this be a linked list?) |
77 |
remove this until we figure out how to do proxy authentication. java.net.Authenticator.setDefault ( new Authenticator() ); |
204 |
org.apache.commons.feedparser.network.URLCookieManager |
Line |
How can we make sure to delete older sites...?! no need for this to grow to infinite size. |
27 |
merge these... new cookies into the site cookies |
94 |
org.apache.commons.feedparser.network.URLResourceRequest |
Line |
do smart user agent detection. if this is a .html file we can set it to us Mozilla and if not we can use NewsMonster _urlConnection.setRequestProperty( "Referer", REFERER ); |
136 |
performance improvement... don't write do disk and then //read from disk.? |
348 |
org.apache.commons.feedparser.sax.RSSFeedParser |
Line |
move to a FastStringBuffer that's not synchronized. |
315 |
it might be possible to call an item again without a member and the value from the LAST item is used... this needs to be a fatal error and we need to clear ... |
475 |
is there a more efficient way to clear a buffer than this? |
486 |
also only do this ifif it's necessary and content has actually been added. This will save some performance. |
488 |
org.apache.commons.feedparser.test.TestProbeLocator |
Line |
Test this |
159 |
Test this |
164 |
We should be able to pass this test when we expand resources inside of the Feed Parser; we don't currently do this yet, Brad Neuberg, bkn3@columbia.edu |
190 |
use the IO package from NewsMonster for this. |
545 |