Developing a Bot to Find Upcoming Activist Events

So one of my goals is to have the best activist events calendar (starting with the US), and to make it a lot better.

I'm pretty sure I've already got the best calendar (you can see them: www.campusactivism.org/searchevent.php - hit submit without any criteria), though it could be much better.

I've started collecting a pretty good list of reoccuring annual conferences (70 of them), as a lot of conferences reoccur and thus they're easier to list (or encourage the organizers to list them - which is the goal).

Now I've got a list of 2000 events that have happenned over the past 7 years in a database. I'm wondering how I could use this information to create a "bot" or some kind of program that would use the google search api, to identify
1) reoccuring conferences
2) non-reoccuring conferences

I'm also looking for other major events like trainings, institutes, and other annual programs. Generally things that are state level or bigger, so they'd be of use to a lot of people (by contrast local peace vigils aren't so important to track). Major city events are nice too. I'd be interested in tracking protests, but most of them aren't reoccuring - so they're harder to do (notable exceptions: SOA protest, IMF/WB protests in DC).

Data that I have to work with
-event name
-event type (conference, protest, speaker, day of action, other)
-location (address, city, state, zip, country, long/lat)
-event description (can range from a short paragraph to several pages listing all the workshops)
-date

Things I could do
-Keyword count - and then I could give Google search results a score (like a spam score) that is calculate on how many times they have key words like: "conference, activist, progressive, etc".
-A correlation matrix - I want to correlate either each event that I have, or some kind of average of the events, with google search results - to find matching events. (Tricky part might be throwing out the matches for events that I already have listed). I could also come up with several types of events (ex. media conference, state progressive conference, student activist conference, etc) and try to calculate matching scores for those (how likely is this search engine result/webpage to be a peace conference versus an anti-racist conference).
-Identify reoccuring keywords that are two words or longer and then do Google searches for those.
-Identify a list of websites with calendars so I can cull events from them (I already have some of this). You could have a bot that runs on every activist website you can think of, looks at the homepage, and perhaps tries to find an events/calendar page (getting a list of websites from the riseup people who did activista, might be useful. Activista was a riseup search engine project that now doesn't exist).

I don't know anything about textual analysis. Any suggestions? What kind of algorithms or libraries or mysql functions might cover this? Perhaps the same kind of software that is used to identify spam?

I'm not sure if I'd need to run a bot or just use Google Search api? Writing my own bot seems like it could be cool, but it might be hard (even the activista bot went out of control on my website - consumed something like 1 GB before they shut it down).

The program could come up with a list of things that are likely to be conferences, and then I could manually review the results and revise the algorithm(s) based on that.

It could also signup to

It could also signup to email lists and use those to find upcoming events.