This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: Internet & WWW

Maltego mines the Internet without violating TOS

By Joe Barr on January 02, 2008 (9:00:00 AM)

Share    Print    Comments   

Not long after Linux.com reviewed Roelof Temmingh's powerful online data mining tool Paterva Evolution a few months ago, Temmingh was forced to remove the application from the Paterva Web site because of complaints that some of the methods he used to harvest data were violating the terms of service (TOS) of the services from which the information was gathered. Recently, Temmingh released a completely redesigned version of the tool -- now called Maltego -- and has made it available again as a free-as-in-beer download.

Like the original, Maltego is written in Java, and it still requires Java Development Kit (JDK) 1.5 or later to run. The GUI looks and behaves much as it did before, but almost everything is new and different under the hood.

Maltego uses numerous methods to search for public information about a variety of entities, such as individuals, phrases, email addresses, URLs, and domain names. These methods are referred to as transforms. In the original release, these transforms were coded as part of the application -- not they're not in Maltego. It has been redesigned so that the Maltego client you run on your machine utilizes a server -- called a Transform Application Server -- to collect and process the information found by the transforms, then returns the results to the client.

This design allows others to write transforms, set up their own Transform Application Servers, or even add their own entity types to conduct searches for virtually any type of information. Users can now modify or add transforms without needing to reinstall the software. The new architecture also allows users to restrict or control the use of transforms through an individual API, thus avoiding the type of complaints suffered by the original design for violating the TOS from search firms, social networks, and others.

Temmingh says that the new design is faster because the transforms do not run locally, "even when you're low on bandwidth. I've tested it with a slow GPRS connection, and it's still very usable."

On the down side, Maltego has less access to information than the original. Unlike Evolution, it can't use Google, because Google's TOS frowns on "scraping," as automated searches are called. Some may find this a bit odd, since Google itself scrapes the Internet to find and catalog the same information it now denies to others using similar tools.

Instead of Google, Maltego uses Yahoo! for its search engine transforms, but limits the number of times Yahoo! can be used during a 24-hour period. Most transforms involving the search of social networking sites have also had to be stripped out for similar concerns over violating their TOS. Two new transforms -- one for RapLeaf and the other for Spock -- try to plug that hole.

With all these changes, is Maltego still a powerful tool for doing your own data mining on the Internet? It proved to be so in my first usage. It took me about 15 minutes to learn about managing transforms so that I could make use of all 64 of them available; the project provides good user documentation on this subject. Once I figured it out, I immediately was able to find new information on a subject of current interest.

Temmingh says that Maltego will probably go commercial in a future version, perhaps by selling the GUI as commercial software, perhaps by selling custom transforms. The Maltego GUI is now based on a model similar to that of the Metasploit Project, and like that project, with its plugin exploits and payloads, the real power comes from the transforms and servers. As such, it might prove to be a lucrative offering for those with a hankering for customized and controlled intelligence gathering.

Share    Print    Comments   

Comments

on Maltego mines the Internet without violating TOS

Note: Comments are owned by the poster. We are not responsible for their content.

How is this a good thing?

Posted by: Anonymous [ip: 68.126.191.42] on January 02, 2008 10:32 AM
Give me a good reason as to why someone would use this in a legitimate setting? Otherwise, this is more power to the spammers/data miners/etc.

#

Maltego mines the Internet without violating TOS

Posted by: Anonymous [ip: 76.68.248.33] on January 02, 2008 05:27 PM
Unsurprising that there have been essentially zero third party contributions to this project: It is not free* software.

* Free as in freedom

#

Re: Maltego mines the Internet without violating TOS

Posted by: Anonymous [ip: 91.140.35.28] on January 02, 2008 11:33 PM
I'm all for GPL but are you stupid? It says in the article it is free as in beer not as in freedom. Are you captain Redundant or something?
And BTW who said he needs any contributions? It's his right to publish it under any license he deems appropriate.

#

Re(1): Maltego mines the Internet without violating TOS

Posted by: Anonymous [ip: 24.80.34.124] on January 03, 2008 08:47 PM
I'm all for responsible criticism but are you stupid? The OP is obviously tying together the lack of 3rd party contribs and the lack of a free license. So rather than being redundant the OP is making a point. Maybe you should try it sometime. Who says the project need 3rd party conrtibs? Why don't you read the article instead of being an asshat? The article says "This design allows others to write transforms, set up their own Transform Application Servers, or even add their own entity types to conduct searches for virtually any type of information." Presumably the design was deliberate and not accidental so the developer is expecting 3rd party development.

#

free-beer download?

Posted by: Anonymous [ip: 195.69.85.62] on January 09, 2008 01:30 PM
Seems there's no download link again. Do they want registration for that? (er, linux.com auth broke for me -- tired of these atm)

(ah, it's here at least after reg: http://www.paterva.com/web2/maltego/maltego-gui-1.0-download.html -- a bit inobvious, do they filter for basic data mining skills this way? :)

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya