The U.S. government takes a big data approach to intelligence gathering. And so can you!
June 22, 2013 (Computerworld)
Everybody's talking about PRISM, the U.S. government's electronic surveillance program.
We don't know all the details about PRISM (also called US-984XN). But we learned enough from abadly designed PowerPoint presentation leaked by NSA contractor Edward Snowden to feeloutraged by its reach and audacity.
In a nutshell, PRISM (and related telephone surveillance programs) take a big data approach to spying on foreign terrorists using American servers.
PRISM and related programs may harvest metadata of every phone call, every email, every Internet search, every Facebook post -- everything -- and use algorithmic filtering to find suspicious communication. Once they've found it, they can get a warrant to listen to the actual phone calls and read the actual email to find clues that enable authorities to stop terrorist attacks before they happen. (You know, Minority Report-style precrime.)
Metadata is not the content of the phone call or email, but the information about them: Who contacted whom, when, from where and for how long.
PRISM inspires shock and awe. But if you set aside the shock part -- the privacy and constitutional implications -- you realize the awe component is worth exploring.
The PRISM approach is this: Cast the widest possible information net, then use machine intelligence to serve up just the needles without the haystack.
PRISM works. It gets government snoops what they're looking for. And if it works for the NSA, it can work for you, too.
In fact, the ideas behind PRISM are built into a wide variety of tools available to everybody.
So here's how to run your own private PRISM program:
1. Capture massive amounts of data
One of the NSA's goals is to record the metadata on every phone call and email.
Obviously, no human personally reads all that data. But it's copied and stored anyway for searching later.
You can take the same approach. One easy way is to use integrated Google services together.
Google now offers 15 GB of free storage that can be divided any way you like between Gmail,Google Drive and Google+ photos. And they'll give you more if you pay for it.
Google also offers an Alerts service that searches the Internet and mails you the results. Most people set up only the number of Alerts that they can read. But that's not the NSA way.
The PRISM approach would be to harvest far more Google Alerts than any human could possible process, then use Gmail filters to automatically skip the inbox and send them straight to a specially created folder within Gmail. You can set up new Alerts every day each time you think of an area of interest. These can include people you know, companies to watch, ideas to keep up with.
Alerts won't send you the data (the story), but the metadata (information about the story, plus the link). One advantage of this approach is that if a site is deleted, making it vanish also from Google Search, you'll still have a record of it with enough metadata to pursue leads.
Note that Google also offers Google Scholar Alerts, which works like regular alerts but that searches academic books, papers and other resources. This is one of the great underappreciated services on the Internet.
You can also spy on yourself NSA style by capturing the metadata on your phone calls and chats. (Of course, the email is already there.)
The trick is to use Google Voice, and turn on the features that save your information to email. (Note that Voice will send your data to any email address, not just a Gmail one.) You'll find the appropriate checkboxes under the Voicemail & Text tab of Google Voice Settings.
This will send metadata on all of your calls, plus full data on all your SMS chats, transcripts of your recorded calls and voicemails and even the sound recording of your voicemails for searching later.
Note that Google's new Hangouts feature, which is accessible in Gmail, Google+ and in the dedicated Hangouts mobile apps, will send the full text of all your chats plus metadata on your video calls to the Gmail address associated with your Google+ account.
You can also use various tools like IFTTT or Zapier to automatically drop all content or metadata from any RSS feed into Google Drive, or alternatives like Evernote for searching later on.
Remember: Do it the NSA way and go nuts with this, dropping dozens, or even hundreds of items per day into your searchable storage. Don't worry about having too much data. Have faith in existing and future search tools to later find what you're looking for.
Beyond the automated harvesting of data, don't forget the manual approach, either. Capture every document that might someday be relevant and dump it into a special folder in Google Drive by using a browser extension like the Save to Google Drive plug-in. (Chrome has other extensions and so do other browsers.) You can do similar one-click saving using Evernote Web Clipper.
Once all this data and metadata is pouring into Gmail and Drive, you can simply use Google's search features to find what you're looking for.
The key to great NSA-style data harvesting, by the way, is to constantly tweak your code. Keep adding, deleting and modifying your Google Alerts and RSS feeds to make sure they deliver the kind of data you want.
2. Use algorithmic filtering
Algorithmic "noise filters" are popping up everywhere these days, especially on social networks and social media services where users could be overwhelmed by too much information.
But thinking like the NSA, we can use these filters to cast a massively wide information net, then let the filters weed out duplicate and irrelevant information for us. (Note that I got this tip from a conversation with blogger Robert Scoble this week.)
The idea is to set up a special-purpose Twitter feed for information harvesting, then use it to follow vastly more content sources than any human could possibly keep up with.
Then, read that feed using Flipboard, Prismatic or some other site that filters content for you and that supports Twitter. (Note that these services also support Facebook and Google Reader, but Google will discontinue Reader soon. Twitter is probably your best bet.)
One thing these filters do well is eliminate content duplicates. Instead of getting 500 stories about the name of Kanye and Kim's baby, you'll get just one story -- probably the best or most popular one -- and get it over with.
Another way to think about the power of algorithmic de-duping is that normally you might not follow a news source from which only one story in 100 is unique or exclusive. But because duplicate stories are filtered out, you get only the one unique story from that source and not the 99 also-ran stories.
This elimination of duplicates frees you to follow news and content sources promiscuously, casting an ultra-wide net without fear of overloading yourself with redundant content.
3. Don't forget the new photograph recognition tech
One of the amazing spy tools at the disposal of the NSA is the ability to process photographs for face, object and location information.
These tools are at your disposal, too.
Facebook's new Graph Search feature lets you quickly experiment with finding photos by trying different queries. For example, if you search for "Pictures taken by people who work at ..." followed by a company, you'll get what you asked for. (This is one way to spy on a competitor, for example.)
Google's picture searching takes it even further, enabling you to search not only for tags, keywords, associated text and location, but also content categorization. Google can actually recognize objects, landmarks and other stuff, even if the person who posted it added no such context.
For example, if you search Google+ for something like Sydney Opera House, you'll get a massive trove of pictures of the building, many of which are not accompanied by any mention of the words Sydney, Opera or House. Google actually recognizes the building using machine intelligence.
The same goes for categories of things. You can search for the word "car," which is not a specific thing but a type or category of thing. Google still gives you cars, whether they're tagged or not.
There's one ironic caveat to using the NSA's methods for wide-scale information harvesting and algorithmic filtering, which is that the NSA may theoretically know everything you're doing.
The NSA's domestic surveillance programs are controversial and possibly unconstitutional. But let's face it: They work.
And the NSA's methods can work for you, too....