This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: Privacy

Evolution beta is a powerful personal data mining tool

By Joe Barr on August 29, 2007 (4:00:00 PM)

Share    Print    Comments   

Roelof Temmingh has written a cool new application which provides individuals with the ability to do data mining of publicly available information. It's a cross-platform Java application called Evolution, currently in its second beta, and available as a free download.

UPDATE: Reolof Temmingh has removed the software from the website saying in an announcement "This is due to circumstances outside of my control. I am not sure how long this outage will last, but perhaps it will be permanent...

H. D. Moore of Metasploit fame raved about Evolution during his packed-to-the-walls presentation at Defcon XV in Las Vegas. It's an impressive piece of work, even if it is still beta, and sports not exactly the most intuitive GUI I've ever seen.

If you're not familiar with the term "data mining," Wikipedia says it "has been defined as 'the nontrivial extraction of implicit, previously unknown, and potentially useful information from data' and 'the science of extracting useful information from large data sets or databases.'"

Still don't grok it? Think of the NSA sifting through network traffic, looking for actionable intelligence. Or if that's too conspiracy-minded for your taste, think of trying to find something new and meaningful in the results of a Google search on Paris Hilton. Evolution is kind of like that, but more aggressive in finding results, and a lot more aggressive in trying to make sense of them.

You can experiment with Evolution using either a classic or a wizard-assisted Web interface.

If you want to run the latest version of Evolution on your own desktop, first make sure you have Java 1.5 or later installed, then download the tarball and decompress it. Enter the paterva_ic3visualizer subdirectory created by tar command and edit the configuration file evolution.conf found in the bin subdirectory. Replace the user names and passwords for the social networking sites that you have accounts at, uncomment them, and remove the entries for sites where you don't have an account.

You can start the program from the bin directory by entering ./paterva_ic3visualizer. A few seconds later, you should see an empty Evolution window appear. If you're running Mozilla -- you do remember the Mozilla browser, don't you? -- you're all set to begin your adventure, but if you're using Firefox, Netscape, or the Swing HTML browser you'll need to set the appropriate browser option the first time you run Evolution by clicking Options on the toolbar, and then clicking on System -> System Settings. From there, you can pick your poison from the drop-down menu for Web Browser. No other browser choices are available in the beta, so if you are using Opera or Konqueror, and you can't figure out how to make them work by hacking /etc/alternatives, you're just out of luck.

The Evolution GUI displays a toolbar across the top. Directly below it is a slider-widget to limit the maximum number of results, and a memory/garbage status display, which you can click at any time to force garbage collection. Beneath those two items is a large empty pane labeled Evolution Graph, which will hold the mining results. Along the right side is a column of three smaller panes: Graph Navigator, Palette, and Evolution Detail View.

The Graph Navigator gives you a thumbnail view of everything in the Graph. If that is more than can be displayed at once, a slightly darker shade of gray in the Graph Navigator is used to display the visible portion. You can navigate the Graph by dragging the darker gray window up and down or left and right in the Graph Navigator.

The Palette holds a number of Evolution entities. Some are infrastructure-related (domain, IP address, DNS name, and Web site) and some are personal entities (email address, person, location, phrase, phone number, and affiliation).

If you position the cursor over the Palette tab in the right-most column, the Palette menu will appear where the Graph Navigator had been displayed.

The Evolution Detail view shows you details about whatever entity currently has the focus in the Graph pane. Often, it will contain jumping-off points for your browser to link you to external sites for additional related information.

A trial run

To see how Evolution works, let's see what we can learn about a person. To begin, click and drag the Person entity from the Palette and drop it on the Graph.

It seems only fair to use Temmingh as the subject of our exercise. You can do that by double-clicking on "Name,Surname" in the Person entity box that appeared in the Graph after we dragged it there from the Palette, then typing Roelof,Temmingh -- with the comma and without any spaces -- and then pressing Enter.

Once we have a target we need to decide the type of information we want to learn about him.

Notice how the Palette pane has been reduced to a tab once again and that the Graph Navigation pane now holds a miniature map -- including the newly created Person entity -- of the Graph. Move the cursor over the Person entity in the Graph and the Evolution Detail View on the right becomes populated with information about the Person. Right-click on the Person entity, and a menu offering 24 different mining operations, called transforms, appears.

The lazy thing to do is to take the 25th option, which selects all the transforms. Let's do that and see what happens. By the way, I changed the limit for maximum number of results from its default of 5 to 10 using the slider-widget. That setting affects both the length of time it takes for the transforms to be performed and the amount of data returned.

Not much happens for about 45 seconds. A progress bar appears along the right side at the bottom of the GUI and the name of the transform executing displays next to it as Evolution moves from one transform to the next. When it's finished, Evolution populates the Graph pane with a couple of dozen new entities. If that's more than can fit in the pane on your system, a slider bar appears along the bottom of the GUI which allows you to scroll the pane horizontally to view the missing bits. As you do so, the Graph Navigator pane on the right shows you what portion of the Graph is being displayed, and what is not.

If you move the cursor over the left and top-most of the entities created by the mining -- it's for DNS name www.guildmusic.com in my results -- the first thing you'll see is a line pointing from the target Person entity to the DNS Name entity. The Evolution Detail View now displays information about the DNS Name entity. Right-click on the DNS Name, and Evolution presents three additional transforms you might want to check. Select the Website option, and a Website entity appears. Put the cursor on it, and the Detail View pane reveals its properties: the URL, a thumbnail of its front page, and the server type and platform. Move the cursor over the thumbnail image, and Evolution will start an instance of the browser specified in Options, opened to that site's URL.

The lines linking the entities provide a visual reminder of how each entity was created, and come and go as you move the cursor between the various entities. For example, the original line between Person and DNS Name disappears while you're working with the Website entity, and a new line between it and the DNS Name appears.

Here's a tip for viewing information in the Detail View. If you're trying to scroll down the Detail View pane in order to read additional information, but the pane empties when you move the cursor from the Graph to the Detail View, click once on the entity in the graph. The line to the entity, and the Detail View of that entity, will remain until you click somewhere else.

In addition to the DNS Name entities included in the results for Temmingh, there are also affiliations (ZoomInfo and GoogleBooks), email addresses, phone numbers, and other Person entities. One of them is Jeff Moss, founder of Black Hat and Defcon, and another is Tiian Van Aardt. Right-clicking on either of those two individuals brings up the same 25 transform options that we began our exercise with. I selected All again for Van Aardt, and after another half minute or so, I was rewarded with a whole new crop of entities in the Graph.

After just a couple of minutes, and a couple of clicks, I had already learned enough about Temmingh to piece together a picture of him and his associates. For one thing, he appears to come from a musical family. One relative with the same name is a well-known South African composer. For another, he is an acquaintance of both Jeff Moss and Tiian Van Aardt. Those names sound like they belong to a rock 'n' roll band, so we can probably conclude that they are all muscians, perhaps even members of the same band!

I'm joking, of course. But we probably don't even want to think about government bureacrats making equally inane calls based on much more sophisticated tools using all available data, pubic, private, and classified.

To clear the GUI of your current data mining so you can start a new exploration, click Edit -> Select All on the tool bar, and then either press the Del key or click Edit -> Delete.

One of the new features in the beta 2 release is the incorporation of the newly popular Wiki Scanner, which allows you to see who has been changing entries on Wikipedia. Here's a quick peek at what you can do with it.

After starting Evolution as before, select DNS Name from the Palette and drag it to the Graph. Then double-click on the default name shown and replace it with microsoft.com. Now right-click on the entity and select the IP Address transform. That provides you with two IP addresses for microsoft.com. Right-click on the first one and select the Net Block transform, then right-click on the resulting Net Block and select Wiki Edits. You might want to slide the Maximum Number of Results all the way to the right before you do.

The resulting entities show that someone from a Microsoft IP address has edited the Wikipedia entries for -- among other things -- linkage between Microsoft and SCO, the Chaos Computer Club, and Einstein's views on capitalism. Click on any of the results that interest you, then move the cursor to the Detail View pane. From there, you can go directly to a page in your browser showing the exact edits performed and the date they were made, or to a list of other edits made from the same IP address.

That should be enough to whet your appetite for more, especially if the idea of intelligence gathering -- whether for business, government, or personal reasons -- without breaking the law, and derived completely from public data, interests you. See the tips Temmingh has written for more things you can do with the current beta.

Conclusions

H. D. Moore was right -- this is a kick-ass application, just seething with power and potential. I will be following its development, and I suspect that many others will do the same, including a number of TLAs.

The addition of new transforms in the second beta, especially the one for Wiki Edit, proves the Evolution framework is mature enough to make transform additions into pluggable add-ons. It's scary to think how powerful this tool might become.

I asked Temmingh if he knew yet how he would license, sell, or distribute Evolution when it's finished. He said that he needs to make some money from Evolution or it will die. He is considering everything from advertising to subscriptions, or selling the GUI and transforms, or selling only the GUI and making the transforms open source, and he is open to other suggestions.

If he decides to sell the GUI, he is undecided on the price, saying only that "it needs to make sense for me to do it. While I love working on this, eventually we all need to eat."

Share    Print    Comments   

Comments

on Evolution beta is a powerful personal data mining tool

Note: Comments are owned by the poster. We are not responsible for their content.

Sure there isn't another name avalaible in the whole Universe?

Posted by: Anonymous [ip: 217.216.158.87] on August 29, 2007 05:25 PM
Why they choose *that* name for their project? oh, man!

#

Evolution beta is a powerful personal data mining tool

Posted by: Joe Barr on August 29, 2007 05:36 PM
The executable is named paterva_ic3visualizer, which might be a better name. But remember, this thing is bleeding edge in all regards.

#

Evolution beta is a powerful personal data mining tool

Posted by: Anonymous [ip: 160.91.145.10] on August 29, 2007 06:09 PM
Evolution is also a very-commonly used linux e-mail application. I would definitely change the name.

#

Re: Evolution beta is a powerful personal data mining tool

Posted by: Joe Barr on August 29, 2007 06:11 PM
Yep, and since I've been using that email client for a very long time, it's the first thing I think of when I see the name.

#

Re(1): Evolution beta is a powerful personal data mining tool

Posted by: Anonymous [ip: 13.8.137.11] on August 30, 2007 08:23 PM
Sheesh! I thought they were going to talk about a Beta release of Evolution v3 or something... I use Evolution more than any other application on my desktop, but I'm often frustrated by its bugs and less-than-perfect integration with Outlook calendaring. I'd like to see some real advances. Oh well......

#

Evolution beta is a powerful personal data mining tool

Posted by: Anonymous [ip: 24.232.147.45] on August 29, 2007 06:58 PM
I don't see where the mining is... :-(
It's a frontend for search tools and the mining is in char of your own brain...

#

Re: Evolution beta is a powerful personal data mining tool

Posted by: Joe Barr on August 29, 2007 07:30 PM
Wikipedia says "Data mining has been defined as "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data" and "the science of extracting useful information from large data sets or databases." So even doing Google searches alone fits the definition.

#

Bad choice of names

Posted by: Anonymous [ip: 64.28.87.56] on August 29, 2007 07:28 PM
Next he's going to release a new application called "Excel" that isn't a spreadsheet program.

#

Evolution beta is a powerful personal data mining tool

Posted by: Anonymous [ip: 41.240.185.213] on August 29, 2007 08:46 PM
I am all open to calling it something else..any suggestions? But don't suggest something like 'Digger' or 'Searcher' - it just wont fly... Roelof.

#

Re: Evolution beta is a powerful personal data mining tool

Posted by: Joe Barr on August 29, 2007 09:24 PM
Hey, Roelof

Good to see you here, and great work on this app. How about IC3Visualizer? Is that taken?

Joe Barr

#

Re: Evolution beta is a powerful personal data mining tool

Posted by: Anonymous [ip: 67.88.249.34] on August 29, 2007 09:44 PM
Here are some names
people eater
Miner
FaceMiner
TheSocial ;)
inquest
shakedown

#

Re: Evolution beta is a powerful personal data mining tool

Posted by: Anonymous [ip: 24.248.89.66] on August 30, 2007 04:17 PM
How about Thunderbird?

#

Evolution beta is a powerful personal data mining tool

Posted by: Anonymous [ip: 195.210.196.7] on August 30, 2007 09:42 AM
As an alternative check http://www.ailab.si/orange/
It's open source I believe.

#

Re: Evolution beta is a powerful personal data mining tool

Posted by: Joe Barr on August 30, 2007 01:26 PM
"As an alternative check http://www.ailab.si/orange/ It's open source I believe."


Thanks for the tip, I'll check it out.

#

What about: ICData (I See Data)

Posted by: Anonymous [ip: 69.216.135.155] on August 30, 2007 11:57 AM
Evolution is a bad choice.

#

Evolution IS a Trademark of Novell

Posted by: Rick Stanley on August 30, 2007 03:40 PM
Please see: <a href="http://www.novell.com/company/legal/trademarks/tmlist.html">Novell Trademark List</a> for a list of their Trademarks. I have sen an email to Novell informing them of the conflict.

#

Re: Evolution IS a Trademark of Novell

Posted by: Anonymous [ip: 129.240.235.122] on September 10, 2007 12:02 PM
Angling for an intern position in Novell's legal department, are we?

#

diaphanous ebenazer

Posted by: Anonymous [ip: 66.122.165.195] on September 12, 2007 04:59 AM
A few years ago 5-10 someone was being abraided for proporting the idea of a node and socket set of tools. It was said the web page contained enough of what is needed and in a more user friendly form. Yet it has taken till now for the release of DOM 3 with lists of out going and incoming links. It would seem as demonstrated by older aplications that in an effort to develop development. An initial node could drag and drop funtional utilities from a library linking them via an initialy blank tool bar (giving it a foo quality if that word can ever be used to discribe anything) the linked node can be given a lable and linked to siblings children and parents likewise. This creates scalability and constructions as demonstrated by circuit construction programs like the spicey kind l,m,n...? as well graphical math interfaces. the idea is that once a building block is made it can be placed into a library (like most programing languages) by using a graphical inter face languages can be mixed using interfaces and a more visual discription of a project is available. The link is a metphor for a socket or connection, pipe. Once a node is constucted its content can be linked to a filter, compiler, refreces used to build the object, interpreter. Siblings could take the form of interpreted expresions so a given funtional utility may be duplicated in other languages. If work is being done and overlap exists an interface hitches a ride on what has gone before untill the compleat independant expresion has been compleated. A person can arrow forward back up down to walk through a program or information path. Weather its absolutaly neccessary (put my cap on) a unique identification number could lable the node however sss.. somebody has already thought of this. This id number mite identify itself by a prefix of thee numbers so you would know those numbers are its id. Like (444.....) or (555..other numbers go next) also as a reminder or a model of the potential concequence of an absolute inentification system. Lets see is there any difference between a modle and the real object.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya