Finding Events: Submit a URL and get the Event Probability

I've come up with an algorithm that predicts whether a webpage is an activist/liberal event. It's designed for conferences, but works for other events too (convergences, assemblies, climate camps, etc).

It's based on tests done on 20,000 webpages. I ran logistic regression tests using 5000 keywords, of which 170 are included in the algorithm. In my test data set, I correctly identify 74% of the events (on a page level), and 99% of the non-events.

If you look at the site-level, I'm identifying around 90-95% of the events! Often an event is briefly mentioned on a webpage and won't be enough to trigger the algorithm, but in most cases there will be a webpage with more details that will trigger.


Try out the algorithm!

Things To Do
-Let users submit domains - to find the best result for the domain (spider the pages with a depth of 1 or possibly 2).
-Detect duplicates.
-Detect old events.
-Differentiate between "activist" and "non-activist". I might have a "semi-activist" category.
-Let users improve the algorithm by voting on pages in our index.