Lately, I’ve been doing some work on social networks, especially on the extraction part. The process while doing such work is typically hacking together a script that does some operations on a dataset, for example the Enron dataset. During my most recent work, I’ve been using Gephi for visualizing the results. Much of the reason for using Gephi is ease of use, awesome layout algorithms and simple importation of data.
What Gephi lacks, however, is the concept of entity types. You have one type of nodes, and one type of edges. When analyzing social networks, this can be a constraint when visually differentiating between various types of actors is essential. Other tools exist that can do help in such analysis, but they are often proprietary and don’t seem to be very fond of importing and exporting data in an open and standardized way.
Maltego and its little brother CaseFile are indeed commercial tools, but may be very useful the type of work I was doing. The share amount of entity types and the ability to add notes and images to objects are very appealing! The problem also here is that also their import wizard is quite limited.
When opening the import wizard, you are given the option to open CSV and XLS files. After choosing your file, you are asked to map columns to their respective entity types. Next, you define what columns that shall serve as links between other columns. You also have to define the direction of the edge in a drag-n-drop fashion.
After having done all your mapping and pressed, “Ok”, you are sent to a page notifying you whether your mappings are correct or not. If they are, good for you! If they aren’t, start over. There is no back button at this point.
When you have successfully imported your data, you can admire the results for a couple of moments before being struck by one of those “it would be even better if …”. You do your ninja-style hacking on your scripts and reopen the import wizard. At this point you might expect to see an option to use the mappings you used last time. You’d be wrong. There is no way to store your mappings, and Maltego/CaseFile does not want to remember your previous settings. So you do the whole import process over again, hoping to remember which way the edges was supposed to point, and what columns represented what entity type. Kudos to Paterva for making an awesome data analysis tool, but as you might understand by now, there are some room for improvement when it comes to importing.
In addition to having to do a whole lot of steps to view your data, the limit of that a column only can represent one entity, quickly turned insufficient. All these limitations motivated my maltego-importer (GitHub) project. The idea was to imitate how Maltego/CaseFile stored data while copying and pasting entities. It turned out that Paterva has made it simple for us: The nodes and edges are stored as GraphML!
By looking at the various XML elements Maltego/CaseFile uses for storing various entities, I’ve made a parser that currently read a CSV file such as:
GangMember, Jon Doe, Male, Some random guy, Shoots Female, Some lady, Male, Some random guy, Sees Female, Some lady, LawOfficer, Policeman, Calls LawOfficer, Policeman, Male, Some random guy, Helps LawOfficer, Policeman, GangMember, Jon Doe, Arrests
By saving the contents above to a file, adding that file as input to the importer and copying the returned string into Maltego/CaseFile, you get the following graph:
From my own viewpoint, this makes quickly viewing the results of my scripts a whole lot easier!
The importer also supports setting the values of other properties in the original entities, however that currently requires some manual work.
The project can be found here: https://github.com/pcbje/maltego-importer
Edit: I’ve written a GUI client for this importer.