Defense, Intel Communities Wrestle With the Promise and Problems of ‘Big Data’ (UPDATED)
For some intelligence analysts, the glut of data coming from multiple sources is an overwhelming problem. There just aren’t enough hours in the day to sift through all the potentially valuable information.
For others, this is a potential boon. Never before has there been so much data available. If they could somehow efficiently pick the best pieces of information, fuse them together with other sources, then they could put together a clearer picture of what is happening in the world.
This all falls under the latest industry buzzword, “big data.” It is a term with different definitions, depending on who is describing it. But one that has the potential to revolutionize the way the defense and intelligence community collects and interprets information, experts told National Defense.
“Right now, this is like the Dark Ages,” said James Canton, a futurist who consults with intelligence agencies, the military and the commercial sector.
“We have a long way to go in the I.C. [intelligence community] and defense community when it comes to modernizing — let alone thinking about the future of information technology — to be able to do what even the private sector can do,” said Canton, who is CEO and chairman of the Institute for Global Futures in San Francisco.
The potential of big data is a world where computers automatically sort through several different information sources — sensors aboard unmanned aerial vehicles, social media like Twitter, shipping manifests, and so on, pick out the important data, piece them together in real time, and send them to an analyst’s smartphone in an easy to comprehend report.
Canton specializes in what he calls, “the extreme future.” This is a vision. It could happen, but there is a possibility that it won’t. The main reasons that it may not come to pass wouldn’t be surprising to anyone who follows the federal bureaucracy.
Databases are kept in so-called “silos,” unconnected to each other, and operated by bureaucrats who don’t want to share what they know. The systems are old, and the acquisition system to refresh outdated government IT is slow, Canton said.
“Unless we accelerate this big data strategy and bring it to the big table, it’s not going to happen. You’re going to have missed opportunities like we have had in the past,” Canton said. He was reluctant to share what some of the “three-letter agencies” were currently doing because he currently acts as a consultant to them. Some like the Central Intelligence Agency and the National Reconnaissance Office are doing some good preliminary work, he said.
But none are as agile as the private sector, which has used big data techniques for years to understand its customers better, and to maximize profits. Companies are much more willing to tear down silos, and get rid of outdated information technology systems than the government, he noted.
“My concerns are that we are going to put a layer of big data, analytics, and prediction on top of an existing bloated system with lots of different stakeholders, with a lot of redundancy ... and that’s not the way to do it,” Canton said.
It’s true that the government has a lot to learn from the private sector, and it should leverage some of the off-the-shelf software and algorithms being used, said Keith Johnson, director of advanced analytics at Lockheed Martin’s information systems and global solutions division.
But the military has a tougher technological nut to crack, he said.
The Targets and Walmarts of the world are good at understanding their customers. If a shopper buys a box of diapers, it probably means they have a baby. A few days later, a coupon for formula is mailed to their address.
“That honestly is pretty easy,” Johnson said. “It is not rocket science.”
Finding a person or object that appears for a few seconds within hours of drone video, then looking for what this information might mean, which is taken from different intelligence sources, then putting it all together to get a better understanding of a terrorist organization — that’s a hard problem, he said.
It is the difference between “structured” and “unstructured” data. Customer information such as name, address, age, sex, is all in easy-to-query databases.
A military analyst may have to sort through unstructured hours of imagery from a satellite to find one piece of important intelligence. Social media is a new source of information. But all those tweets on Twitter amount to millions of messages. Most of them are of no importance.
“Saying a terrorist is more likely to strike on a Thursday morning — that’s not good enough,” said Johnson. “We need to know who, when, where and how to stop them. That is something the commercial sector is not necessarily working on.”
Some vendors are putting together sophisticated algorithms that draw on multiple databases to give analysts an in-depth picture of what is happening in areas they are monitoring.
TerraEchos, a Missoula, Mont.-based company, has demonstrated its Zspace Visualization system to a handful of government agencies.
“We really want to get to taking any kind of sensor — whether it be human [intelligence], or whether it is automated. It can be video, signals, anything. And we want to intuitively design that so someone can look at what is going on now, and reduce the time to decision,” said Dan James, vice president of products at the company.
He demonstrated two scenarios where the software can be used.
In one, a military base is undergoing a combined physical and cyber-attack. One database picks up evidence of hackers attempting to take control of security systems. Meanwhile, cameras set to alert guards when there are four or more people lingering outside the front gate are cued. All this would raise the threat level. A security officer can see all this unfold on a map of the facility.
In another scenario, a U.S. consulate in a volatile nation monitors social media. A streaming database monitoring GPS-enabled mobile devices sending out tweets picks up evidence of an impending demonstration, or riot. They appear as dots on a map, and start to converge on a neighborhood. An unmanned aerial vehicle may be automatically sent to fly over the area to collect live, streaming video to observe any crowds that might be gathering. Police vehicles with GPS trackers that appear in another database — also represented as dots on a map — could then be dispatched to tamp down any unrest.
James, a former military analyst, stressed that humans aren’t controlling these actions. With today’s data glut, an intelligence officer sitting at a workstation querying a database is like searching for the proverbial needle in a haystack.
“You’re asking an analyst or a decision-maker to ask the right question. That just doesn’t work anymore. There are too many databases out there. There’s too much information, and the structure of a question is too narrow. They are too overwhelmed,” he said.
“They need something that is continually running, to raise up what is important, because an analyst is never going to find it,” he added.
Social media is the latest part of what Canton called, the “data tsunami.”
The role popular Internet applications such as Twitter, Facebook and Imgur played in the Arab Spring, particularly in Egypt, was a wake-up call for the governments of the world. The intelligence community largely was not paying attention to the relatively new Twitter. That won’t be the case in the future.
BBN Technologies, a Raytheon-owned company that works primarily with the Defense Advanced Research Projects Agency and its Homeland Security and intelligence community counterparts, has developed software that can make meaning out of the millions of Twitter posts in foreign languages.
BBN specializes in real-time translations. That know-how is now being applied to social media, said Sean Colbath, a senior applications engineer.
“Big data is anything that one person is not willing to look at in eight hours,” he said, offering his definition of the problem. At the end of the work day, analysts need to file a report, and go home like everyone else.
Linguists are some of the intelligence community’s most precious human resources. Once, they would be paid to sit and watch one television news station all day — most likely in Russian.
BBN’s broadcast monitoring system now automatically translates 15 different languages in real time. It’s not always precise, but it conveys the basic meaning, and most importantly, it is searchable with key words. Analysts can call up dozens of different media references for one event. It has similar systems that monitor Internet news sites and radio broadcasts.
The next challenge is the high volume of information that is being voluntarily posted on websites in foreign languages. BBN’s Social Media Analytics tool allows the analyst to pick hundreds of key influential users on Twitter, and follow what they are tweeting about.
No analyst could ever have the time to interpret and correlate all the related messages from these key “influencers,” he said. The number of retweets and hashtags from 300 users rises exponentially to millions of related messages per month, he said.
The software looks for key words expressing sentiment. Eighty percent of tweets express no sentiment at all, he estimated. But 20 percent do. Of those 20 percent, about 80 percent are expressing negative feelings. This 80-20 rule, as Colbath likes to call it, shows what he believes is a truism: People go to the Internet to complain.
Not all the complaints are of interest. They might be expressing dislike for a movie, for example.
But even when all the relevant tweets are compiled, it is still too much for one person to sort through. The software finds key words, and shows sentiment, or what citizens are talking about, in pie charts, bar charts, or overlaid on a map.
“Even if it is all translated from Arabic into English, it is still too much. I have to turn it into some kind of pictorial representation,” Colbath said.
Ultimately, analysts can answer questions about how populations feel about their governments, or their attitudes about world events.
Canton said: “Twitter, Google, they have a tiger by the tail. They themselves don’t fully understand the future of this technology.”
Lockheed Martin is pursuing data fusion technology across all sectors in which it does business, Johnson said.
“The proliferation of sensors and the ability to transport it back to users, and the proliferation of open sources, social media, mobile devices have opened the [military’s] eyes to the possibility of fusing all this information together,” he said.
“Big data technologies are going to penetrate through every customer set that we have,” he said. Lockheed Martin is working with not only military and intelligence organizations, but civilian agencies such as the Social Security Administration, Medicaid and Medicare. Its products analyze data to better understand medical outcomes, or correlate different databases to root out waste, fraud and abuse.
Big data tools can be applied from everything to monitoring the spread of diseases to cyberthreats. They can also look for the one piece of crucial data “in the deluge,” he said.
Examples of this are the Twitter users, who after an August 2011 earthquake in Mineral, Va., reported the incident 120 seconds before the U.S. Geological Survey. One man in Abbottabad, Pakistan, sent out tweets about helicopters hovering near his house May 1, 2011 at 1 a.m. Those were U.S. commandos on their way to kill Osama bin Laden.
“How do I know that wasn’t an important tweet, something I should pay attention to?” Johnson said.
Canton said key leaders in the federal government haven’t quite grasped the potential of big data. The joint chiefs of staff and chief information officers from across the federal government need to get together in the same room and start “white boarding” where they want to go with the technology, he said. “That hasn’t happened yet.”
Ultimately, this can lead to prediction and prevention of conflicts and the promotion of security, he said.
The doubled-edged sword of big data is what he calls “Canton’s Law.” Everything that can be connected, will be connected.
In democratic nations, privacy issues are of paramount importance. There has to be good governance in place, and respect for individual freedoms, he said.
“Not every big data fix will be within the boundaries of the law. You have to be careful about that … and it is important to put that on the agenda,” he added.
Corrections: In the original article, James Canton was identified with the incorrect first name. Also, the original version of the story had stated that TerraEchos had sold its software to government customers.
Photo Credit: TerraEchos