Webu is completely free, has no restrictions and the Delphi source code is provided.
Contents:
Monitoring RSS / Atom news feeds
AI / Learning about your interests
Keyword format
Configuration
Monitoring web sites for keywords
Searching web sites for interesting content
FAQ
Monitoring RSS / Atom feeds
To monitor RSS and Atom feeds, click the orange icon which brings up the feeds page. Enter the address of the RSS / Atom feed and click "Add feed" (the address will be a URL, e.g. http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/uk/rss.xml). The feed will be added to the list on the left. You can then define the keywords (see Keyword format) and the refresh interval (how often the feed is checked). The active feeds will be searched automatically at the refresh interval, but you can use the buttons to search individual feeds or all active feeds if you don't want to wait for the refresh interval. When an article is found that contains your keywords, Webu will pop up a window with the detail and a link the to originating website.
Note that for RSS feeds the TITLE and DESCRIPTION fields are checked for keywords, and for Atom feeds the TITLE and CONTENT fields are checked.
AI / Learning
The more contact I have with humans, the more I learn" - Cyberdyne Systems Model T-101, "Terminator 2"
Simple keyword monitoring isn't always sufficient for informing you about news articles that you are interested in, so the program can learn over time about the kind of articles that you like, and will start to show you similar articles even if they don't contain any of your keywords. It works as follows: each time Webu shows you a web page it will offer you the chance to rate the page on a scale of "totally irrelevant" to "very relevant". As you indicate your preferences for or against particular types of content, Webu will start to learn about the kind of stories that interest you and will start to show you other articles it thinks you may find interesting. The more you provide feedback, the more it will learn about what interests you.
To allow you to kickstart the learning process I have put a "Learn" button on the RSS/Atom form - when you click this it will show you all articles from the selected feed and ask you to rate them. This can take a while, but the more feedback you provide, the more it will learn. If it starts to show too many or too few articles you can adjust the slider "Show articles with relevance LOW...HIGH" - the nearer to "HIGH" you move the slider position, the fewer articles will be shown as an article will need a very high relevance to trigger an alert. You can repeat the learning process as many times as you like with different feeds - the more feedback you give the program, the more it will learn about your interests. I recommend you run this at least 10 times on a variety of different feeds, e.g. news, entertainment, sport, business, politics etc, so that the program can start to build up a comprehensive database.
To start you off I have added some feeds from the BBC website, so to start the program learning, click "Learn" against a number of these feeds (you can delete any feeds you don't like or add ones from other websites).
I would appreciate any feedback about how well the learning algorithm is working - there are some parameters that can be tuned to optimise the process, but as I have only written this recently I am not sure exactly what these should be. Please feel free to let me know if you think the program works well or could be improved. If it shows you an article you don't like, simply rate it with a negative score and it will update its relevance database accordingly - this will reduce the probability of similar articles being shown.
A brief note on privacy - Webu will gradually build up a large store of information about the kind of things you find interesting, and this information is stored in the "DATA" directory. If this concerns you, you may want to consider restricting access to this directory, or if you are really paranoid, putting the application on a TrueCrypt drive. Please note that at no point does the program transmit any of this data anywhere else.
Keyword format
For keyword monitoring you define the keywords by entering a number of particular words per line, separated by a comma. An alert is only triggered if all keywords on a line are found. For example:
1:iran,nuclear
Note that only lines that start with "1:" are considered "active" and will be checked - this is so that you can temporarily disable a set of keywords without having to remove them entirely.
You can specify words that must not be found by putting a ! character in front of the word, e.g. suppose you wanted to show articles that contain the words "George" and "Bush" but not "Iraq", you could put "1:george,bush,!iraq
Other configuration
There are some additional configuration options that are defined in the file webu.conf, some of the more useful options are described below:
proxyHost=
repeatInterval=10080
loadPagesOnStartup=false
forceToForeground=true
showNagScreens=true
runAfterIdleFor=0
verboseOutput=false
processMessagesInterval=10
Common words
Monitoring web sites
Start Webu and select the menu "File..New" (or click the top-left icon) and a new window will appear. Enter the address of the web site to monitor and enter the refresh interval, then enter the keywords (See below for an example of how to enter the keywords). Then wait for the window to start updating, or you can refresh it at any time by clicking the "Go" button. Note that the program only searches the raw text and not anything inside HTML tags. You define the keywords by entering a number of particular words per line, separated by a comma. To trigger a match, all the keywords on a particular line must be found on a single line on the web page. For example, suppose you enter the keywords as follows:
1:iran,attack
Now, if a web page contains the following line, this will not trigger an alert, because the keywords "attack" and "bush" were not found on the same line:
A new pesticide will attack insects that eat the gorse bush
However, if the website contains the following line, this will trigger an alert because the keywords "george" and "bush" are on the same line:
George Bush visited Europe today
This helps to avoid the kind of false-positives you often get when using search engines. If you want to see the raw lines of text that appear on a particular page, click the "Show Raw Text" button when the page has loaded (not available on RSS/Atom feeds).
Another example - in the UK there is a rail enquiries website that shows train departure times, so you can see if your train home is delayed, e.g. http://www.livedepartureboards.co.uk/ldb/sumdep.aspx?T=WAT shows the train times for London Waterloo. Entering the following keywords will tell you if the 17:12 or 17:23 to Basingstoke is On time, Delayed or Cancelled:
1:Basingstoke,17:12,On time
You can specify words that must not be found by putting a ! character in front of the word, e.g. suppose you wanted to show articles that contain the words "George" and "Bush" but not "Iraq", you could put "1:george,bush,!iraq
Note - if the web site you wish to monitor has an RSS/Atom feed it will almost certainly be better to use this as the program will run a lot faster and take up less memory. Only use this option if you need to monitor a site without feeds.
Searching web sites for interesting content
I wanted to be able to instruct my PC to "go and search the web for something I may find interesting, and let me know when you have found it". Therefore I built into Webu a page where you can put in a starting web page, and it will follow all the links on that page, examining each page to see how relevant it is, and also following the links on that page, etc, etc. It tries to follow links that look interesting, i.e. if a particular link text scores a high relevance factor it will be searched before a link where the text scores lower. Any web pages where the actual content scores a high relevance factor are added to a list of found pages, sorted by relevance. The process basically continues until you stop it, or it runs out of links. In order to prevent too many pages being added, you can adjust the slider - the nearer you move it to "HIGH" the fewer pages will be added.
You will get vastly different results depending on the starting web page - it is impossible to predict what kinds of sites it will find, so if it gets stuck or goes wandering off and you want to stop it, just click the Stop button and it will stop searching. By default it will use Google News, but you can set the starting page to any address you like.
Please note that this will be more effective if Webu has a good idea of the kind of content that interests you, so run a few "Learn" sessions before trying to use it otherwise it will just return irrelevant pages. Also note that this really requires a pretty powerful PC to work properly - a typical web page has a lot more text than an RSS article and it can take the program a very long time to analyse the page. I have a really powerful gaming PC at home and it takes a while to analyse a page even on this. To be honest this feature is more for interest than to provide a useful search utility, but sometimes it can be interesting to see what web sites it finds.
Frequently asked questions
If you have any questions or want to report a bug please drop me a line at
Webu 1.18
0:grand theft auto,game
1:pentagon,hack
proxyPort=
proxyUsername=
proxyPassword=
proxyBasicAuthentication=true
You can use this option to set proxy server configuration, you may need this if you are using the program at work
This is the number of minutes to wait before showing a particular alert again, i.e. if the program has already popped up a message about a particular keyword, you don't want it to show you the same alert 5 minutes later. 10080 is the number of minutes in 1 week which is the default.
If you this to true the program will perform an immediate search when the program starts up. If false, the program will wait until the "refresh interval" has elapsed
If this is set to true, Webu will always pop up the alert window in front of any applications that are running. If this is set to false, it will simply flash the icon on the taskbar when it needs to alert you.
If this is set to true, Webu will show a reminder message if you do not provide a relevance rating for at least 50% of the items it shows you. To stop these reminders, set this setting to false.
This setting can be used to try to stop the program taking up lots of CPU and bandwidth while you are working on your PC. If it is set to a non-zero value, the program will only scan the feeds for articles when your PC has been idle for at least this number of seconds (i.e. you have not used the mouse or keyboard). If left at zero the program will not check to see if you are using your PC or not.
The program writes output as it runs to a status panel on each page, this is so you can see what it is doing. If you set this option to true, additional information will be written which can help diagnose problems. Normally you should set this to false as the program will perform a lot faster.
Because the program can take a long time to analyse pages for relevance it can become unresponsive to user events such as mouse clicks (although it will respond eventually). This value specifies (in seconds) how often the program should check for outstanding GUI messages such as mouse clicks. The lower this value, the more frequently the program will respond, but this will increase the time taken to scan articles and web pages.
The file "data\commonwords.dat" contains a list of common words that are not included when checking the relevance of a piece of text. You can add additional words to this file if you like - this can help to stop false-positives, i.e. where a page has a higher relevance rating than it should have due to the presence of common words such as "the", "and", "news", "july", "said", "went" etc..
1:george,bush
1:Basingstoke,17:12,Delayed
1:Basingstoke,17:12,Cancelled
1:Basingstoke,17:23,On time
1:Basingstoke,17:23,Delayed
1:Basingstoke,17:23,Cancelled
As I don't get a lot of time to respond to emails, I have created a forum where you can discuss this software with other users. Feel free to post bug fixes, suggestions for improvements, questions or anything else related to this software.
The forum link is http://wuulsoftware.freeforums.org/
Previous versions of Webu:
Webu 1.17
Webu 1.16
Webu 1.15
Webu 1.14
Webu 1.13
Webu 1.12
Webu 1.11
Webu 1.10
Webu 1.09
Webu 1.08
Webu 1.07
Webu 1.06
Webu 1.05
Webu 1.04
Webu 1.03
Webu 1.02
Webu 1.01
Webu 1.0
Webu 0.9
Webu 0.8
Webu 0.7
Webu 0.6
Webu 0.5
Webu 0.4
Webu 0.3
Webu 0.2
Webu 0.1
To build the software you will need these additional components