You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

162 lines
9.8 KiB
Plaintext

<!--210121,210304-->
<h1>generating an RSS feed with python bottle</h1>
march 18, 2021<br>
#bottle #rss #webdev <br>
<br>
After a few months of quietly running my blog as practice, I want to start sharing my articles with other people. I looked over my favorite gamedev communities and saw that <a href="https://gamedev.net/">GameDev.net</a> apparently allows you to syndicate a blog through RSS. I never thought about making an RSS feed, so why not? <br>
<br>
<h2>what is RSS? </h2><br>
Before the massive centralized content platforms came into the mainstream, the internet was more like a constellation of individual websites. In lieue of algorithm-driven feeds and push notifications from major social media, RSS was designed to bring content from scattered websites into one place. <br>
<br>
RSS and its predecessors have been around since the 90s. RSS 2.0 (what blessfrey.me uses) was published in 2002. Even through it's old and falling in popularity, it's still used by some large aggregators today like <a href="https://publishercenter.google.com/publications#p:id=pfehome">Google News</a> and <a href="https://podcasters.spotify.com/submit">Spotify</a>. <br>
<br>
Here's a few examples from around the internet, a mix of large + small news websites and forums: <br>
<ul>
<li><a href="https://www.theguardian.com/world/rss">The Guardian</a> </li>
<li><a href="https://www.gamasutra.com/blogs/rss/">Gamasutra</a> </li>
<li><a href="https://www.polygon.com/rss/index.xml">Polygon</a> </li>
<li><a href="https://store.steampowered.com/feeds/news.xml">Steam</a> </li>
<li><a href="https://www.gamedev.net/tutorials/rss/">GameDev.net</a> </li>
<li><a href="https://www.nasa.gov/rss/dyn/chandra_images.rss">NASA's Chandra Mission</a> </li>
<li><a href="https://www.temptalia.com/feed/">Temptalia</a> </li>
</ul>
<br>
<h2>what goes into an RSS feed? </h2><br>
RSS files themselves are written in XML. They should contain the latest 10-15 entries along with things like their title, link, summary, date, and author. <br>
<br>
Blogging platforms like <a href="https://wordpress.org/support/article/wordpress-feeds/">WordPress</a> already take care of the RSS feed, but there's no shortage of third-party RSS creators on the internet. Since I have already written code to display + summarize my articles on the <a href="https://www.blessfrey.me/diary">diary</a> page, the 'latest' box in the diary's sidebar, and the 'news' box on the <a href="https://www.blessfrey.me/">index</a> page, I figure I can format the data one more time into an XML file. <br>
<br>
Here's truncated version of blessfrey.me's feed as an example (also available on <a href="https://pastebin.com/bbrVp58E">Pastebin</a>: <br>
<br>
<code>&lt;?xml version="1.0" encoding="utf-8"?&gt;</code><br>
<code>&lt;rss version="2.0"&gt;</code><br>
<code>&lt;channel&gt;</code><br>
<code>&lt;title&gt;blessfrey.me&lt;/title&gt;</code><br>
<code>&lt;link&gt;https://www.blessfrey.me/&lt;/link&gt;</code><br>
<code>&lt;description&gt;chimchooree's dev space&lt;/description&gt;</code><br>
<code>&lt;language&gt;en-us&lt;/language&gt;</code><br>
<code>&lt;webMaster&gt;chimchooree@mail.com (chimchooree)&lt;/webMaster&gt;</code><br>
<code>&lt;item&gt;</code><br>
<code>&lt;title&gt;making an rss feed&lt;/title&gt;</code><br>
<code>&lt;link&gt;https://www.blessfrey.me/diary/entries//diary/entries/210318&lt;/link&gt;</code><br>
<code>&lt;description&gt;After a few months of quietly running my blog as practice, I want to start sharing my articles with ... &lt;/description&gt;</code><br>
<code>&lt;pubDate&gt;Thu, 18 Mar 2021 8:05 CST&lt;/pubDate&gt;</code><br>
<code>&lt;guid&gt;https://www.blessfrey.me/diary/entries//diary/entries/210318&lt;/guid&gt;</code><br>
<code>&lt;/item&gt;</code><br>
<code>&lt;/channel&gt;</code><br>
<code>&lt;/rss&gt;</code><br>
<br>
I'll explain each tag, but they are also described on the RSS Advisory Board's <a href="https://www.rssboard.org/rss-profile">Best Practices Profile</a>. There are more tags, too, so research documentation + examples to see what suits your website. <br>
<br>
<h3>XML declaration</h3><br>
Identifies the document as XML and defines the version + character encoding. It's required and must be the first line. <br>
<br>
<h3>RSS</h3><br>
The top-level element that defines version number. It's required. <br>
<br>
<h3>channel</h3><br>
Nested within the RSS tags and describes the RSS feed. It's required. <br>
<br>
There's some tags nested within the channel. <code>Title</code>, <code>Link</code>, and <code>Description</code> are required. <br>
<br>
<ul>
<li><b>title</b> - defines the title. Mine is just my website name, but large websites may have multiple feeds. It's required. </li>
<li><b>link</b> - defines the link to the channel. So, either your website or a specific area of your website. It's required. </li>
<li><b>description</b> - describe the channel in a few words. I used my website's tagline. It's required. </li>
<li><b>language</b> - defines the language, using a <a href="https://www.rssboard.org/rss-language-codes">RSS language code</a>. It's optional. </li>
<li><b>webMaster</b> - provide the contact email + name for technical issues regarding the feed. It should look something like <code>example@example.com (Name McName). It's optional. </code> </li>
</ul>
<br>
<h3>item</h3><br>
Nested within the channel. Each article will be defined with the item. <br>
<br>
There's some tags nested within the item. <code>Title</code>, <code>Link</code>, and <code>Description</code> are required. <br>
<br>
<ul>
<li><b>title</b> - defines the title of the article. It's required. </li>
<li><b>link</b> - defines the link to the article. It's required. </li>
<li><b>description</b> - summarize the article in one or two sentences. It's required. </li>
<li><b>pubDate</b> - indicates the date and time of publication, conforming to the <a href="https://www.rssboard.org/rss-profile#data-types-datetime">RFC 822 Date and Time Specification</a>. Follow a pattern like Mon, 15 Oct 2007 14:10:00 GMT. </li>
<li><b>guid</b> - A unique identifying string, which helps aggregators detect duplicates. Aggregators may ignore this field or use it in combination with the title and link values. </li>
</ul>
<br>
<h2>how to make an RSS feed </h2><br>
I'm generating the RSS every time I update my website. That way, it always exists on the server and can be served as a static file. If I used a template, the RSS file would not exist unless it was accessed. I'm not sure if that would be an issue and haven't experimented. <br>
<br>
Since Bottle is just Python, I can generate the file similar to how I format my articles into diary snippets, pages, and headlines. <br>
<br>
I'll share the main code, but the full code can be viewed on <a href="https://pastebin.com/E0PvNtbp">Pastebin</a>. <br>
<br>
<h3>code example </h3><br>
def make_rss(): <br>
loc = 'diary/entries/' <br>
info = {'items': list_items(gather_and_sort(loc)[0:15])} <br>
<br>
# Return list of items <br>
def list_items(articles): <br>
f_name = "static/xml/blessfrey.xml" # the RSS file <br>
loc2 = 'https://www.blessfrey.me/' <br>
loc = 'diary/entries/' <br>
loc3 = loc2 + loc <br>
result = [] <br>
<br>
for article in articles: <br>
path = loc + article <br>
text = [] <br>
a = [] <br>
length = 0 <br>
text = article2list(article, loc) <br>
a.append(find_title(text)) <br>
a.append(find_url(path)) <br>
a.append(clean_tags(prepare_rss_summary(text, path))) <br>
a.append(find_timestamp(text)) <br>
result.append(a) <br>
<br>
clear_file(f_name) <br>
f = open(f_name, 'w') <br>
f.write("&lt;?xml version=\"1.0\" encoding=\"utf-8\"?&gt;" + '\n') <br>
f.write("&lt;rss version=\"2.0\"&gt;" + '\n') <br>
f.write("&lt;channel&gt;" + '\n') <br>
f.write("&lt;title&gt;blessfrey.me&lt;/title&gt;" + '\n') <br>
f.write("&lt;link&gt;https://www.blessfrey.me/&lt;/link>" + '\n') <br>
f.write("&lt;description&gt;chimchooree's dev space&lt;/description&gt;" + '\n') <br>
f.write("&lt;language&gt;en-us&lt;/language&gt;" + '\n') <br>
f.write("&lt;webMaster&gt;chimchooree@mail.com (chimchooree)&lt;/webMaster&gt;" + '\n') <br>
<br>
for r in result: <br>
f.write("&lt;item&gt;" + '\n') <br>
f.write("&lt;title&gt;" + r[0] + "&lt;/title&gt;" + '\n') <br>
f.write("&lt;link&gt;" + loc3 + r[1] + "&lt;/link&gt;" + '\n') <br>
f.write("&lt;description&gt;" + r[2] + "&lt;/description&gt;" + '\n') <br>
code = r[1].replace(loc,'') <br>
code = code.replace('/','') <br>
f.write("&lt;pubDate&gt;" + format_rss_time(code) + "&lt;/pubDate&gt;" + '\n') <br>
f.write("&lt;guid&gt;" + loc3 + r[1] + "&lt;/guid>" + '\n') <br>
f.write("&lt;/item&gt;" + '\n') <br>
<br>
f.write("&lt;/channel&gt;" + '\n') <br>
f.write("&lt;/rss&gt;" + '\n') <br>
f.close() <br>
<br>
return result <br>
<br>
# Serve XML <br>
@route('/static/xml/&lt;filename:path&gt;') <br>
def serve_xml(filename): <br>
return static_file(filename, root='static/xml', mimetype='text/xml') <br>
<br>
## Main ## <br>
<br>
if __name__ == '__main__': <br>
make_rss() <br>
run(host='127.0.0.1', port=9001) <br>
<br>
<h2>double-check </h2><br>
The <a href="https://www.rssboard.org/rss-validator/">RSS Advisory Board</a> and <a href="<a href="https://validator.w3.org/feed/">W3C</a> have feed validation services that can check the syntax of Atom or RSS feeds. It's nice to check but don't feel pressured to meet all the recommendations if they don't suit your needs. <br>
<br>
<h2>double-check </h2><br>
Now I have an RSS feed, available at <a href="/static/xml/blessfrey.xml">https:/blessfrey.me/static/xml/blessfrey.xml</a>. Feel free to use it if you prefer to read through an aggregator and contact me if there's technical problems. <br>
<br>
Last updated March 19, 2021 <br>
<br>