Friday 2 May 2014

New publications

Added the following publications:

  • The Citizen
  • Sowetan Live
  • Dispatch Live
  • The New Age
  • Business Day Live
  • Times Live
  • Daily Maverick
There are still more to add. Adding feeds still requires some amount of manual labour, although this can be done far more generically than before. To add new publications and feeds, one needs to manually specify the RSS url(s), information about how to extract the author, and information about any static text to remove (either because Reporter misidentifies it and includes it in the plaintext or because if it is too long, such as the IOL copyright notice, Reporter may ignore the main text completely for short articles and pick up on the static text instead). 

At the moment I am specifying this information programmatically so a typical new entry may look something like this:


dailymaverick = Publication("Daily Maverick", "http://www.dailymaverick.co.za", ["http://www.dailymaverick.co.za/rss"],
{'tag_type':'li',
'attribute_name':'urlid',
'attribute_value':'authorid',
'splitstring':"<div", 
"splitindex":0 },
{'attribute_name':'span','attribute_value':'style'})
dailymaverick.create_feeds()


But the UI to allow the same functionality should be ready soon(ish).


No comments:

Post a Comment