If you’re happy and you know it … so might they. In time, they may even know what you’ll do next.
Albert Einstein once said that information is not knowledge. That doesn’t seem lost on anybody as I sit and stare at one of the most interesting (and powerful) social media analytics projects in the country.
I’m in an unassuming building on the campus of the National Research Council, imaginatively named M-50. Richard, the photographer, arrived at the same time, slipping in next to me in the parking lot. Both of us found the building by chance, driving aimlessly around the sprawling complex’s circuitous roads because I forgot to download the map I’d received earlier.
We entered the building early, striding though double glass doors, and waited around the bright, roomy lobby after checking in with the Commissionaire. An old, plush recliner stood out amongst some pristine looking office furniture carried into place around the same time Sarah McLachlan released the single, “Building a Mystery.”
A few minutes later, Richard and I met our guides and were escorted down a long white corridor to a large, non-descript room with a collection of brown laminate tables in the middle and a screen at one end.
Funny, I thought to myself. You’d think the brightest minds in Canada, working on some of the most technologically advanced projects in the world, would have facilities on par with Tony Stark or Professor X (for you comic book fans). But as mere mortals without fictional Hollywood budgets, they make do with PowerPoint and whiteboards like the rest of us.
Seated by the laptop at the front of the room was Pierre Isabelle, Research Officer, Multilingual Text Processing and to my right, Andrew Scheidl, Multimedia Analytic Tools for Security Program Lead. Scheidl opens with an explanation of NRC. “We’re really about bringing science to improve the lives of Canadians. We do that in part by inventing helpful technologies, but also by developing and transitioning the right ones to help Canadian companies gain competitive advantage.”
And today, they’ve invited us to experience a fascinating set of those technologies. About two years ago, Isabelle explains, they began thinking about how to apply some of the language technologies they had created for other sectors to the defence and security domains.
“In that respect, the notion of Big Data was already very visible, very important,” Isabelle explains. “We thought maybe people needed to examine social media. Social media ooze text, and this is what we’re good with: sorting text, organizing text, translating text, searching text for sentiment and emotions and extracting relevant information.”
With that in mind, Isabelle decided to submit a project proposal about mining social media using the latest of the NRC’s natural language technologies. Isabelle is proud of the NRC’s accomplishments in this domain, referring to it as “world class.”
What followed was a collaborative project within the Canadian Safety and Security program, undertaken jointly between the NRC, Thales Canada, and MediaMiser. (The Centre for Security Science itself is a collaborative office of Defence Research and Development Canada (DRDC) and Public Safety). Scheidl informs me that the project is actually just wrapping up. “This is a two-year project,” he says. “And it ends this month, officially. It’s been a great experience for our researchers, drawing together a number of our state-of-the-art technologies in a single platform.”
In order to understand the project, officially known as “Countering Security Threats using Natural Language Technology,” Isabelle begins by forming a picture in your mind. Imagine massive amounts of rapidly-flowing, publically available information from Facebook, Twitter, blogs and newsfeeds gushing through gigantic pipes.
Using existing software supplied by MediaMiser, an Ottawa-based media monitoring company, Isabelle is able to filter out potentially relevant text as it flows.
“Different datasets get extracted from the web, and that’s a decision we make early because we can’t collect the whole pipe. It’s much too big,” he says.
It’s at this point the technologies Isabelle and his team have created come into play. As the relevant text comes in, it gets machine translated (if necessary), summarized, and interesting entities like people, places and organizations are detected. But that’s not all. Probably the most interesting component of this process is that the text is automatically analyzed for sentiment. “By sentiment,” Isabelle begins, “we mean, is the attitude of the author negative, positive, or neutral?
As the information passes — on the fly — computers tag it with the appropriate human emotion. “We attach real names of emotions to text, or sentences in the text, like, ‘This is expressing anger. This is expressing fear.’ That sort of thing.” Isabelle says.
At that moment, it isn’t readily apparent just how powerful a tool that can be. But Isabelle isn’t finished explaining the process.
After the texts are “annotated,” they finally end up in a special purpose database at Thales Canada’s research facility in Quebec where the information is indexed in order to become searchable. Thales Canada provides the visualizations of the analyzed data we are shown.
“Each of these technologies are already proven,” Isabelle admits, “so the goal of this project is not to develop them, but merely to show their usefulness in the context of pulling out information for security purposes.”
Thales’ web-based interface allows you to search through that information the same way you would do a Google search… except that this search yields some pretty powerful insights.
It’s here, in this room within building M-50, that I find myself looking out at the frontier of big data analytics. The view is fascinating, but also a little frightening at the same time.
Isabelle motions to the screen in front of me. It’s a projected view of his laptop monitor displaying a web-based interface, the program developed to search through the annotated data. I jump ahead, asking him if he could use it to find a single person — someone like Parliament shooter Michael Zehaf-Bibeau, or Martin Couture-Rouleau — who was planning on carrying out a terrorist attack.
The simple answer is that in the future we will become increasingly good at detecting threats ahead of time, even when posters try to hide in the mass. For example, even though social media users often turn off the geolocation function on their mobile devices, it is now possible to geolocalize their posts within an average distance of eight kilometers. “We could probably do even better than that” said Isabelle.
With the interface, users can perform fine-grain queries on certain assets, precisely narrowing down the things that are of interest. “The collections can be queried on all sorts of dimensions,” Isabelle says. “Ordinary words, series of words, hashtags, authors, places…a given time period or specific location.”
For example, users could type in something like this: “Give me all the text that mentions the Pan Am Games with negative emotions.”
The results would yield a trove of information that can be restricted to specific dates and/or geographic location. Therefore, the system could be used to figure out how Torontonians feel about the Pan Am Games in the week leading up to the event, or alternatively, to find those people who may pose a potential threat. The NRC team makes clear that for this technology demonstrator, the focus is on foreign events.
Isabelle brings up a “card” containing information they’ve been gathering on Syria. It’s essentially a timeline, which shows an evolution in posts about that country over a given period. “You can see each day of the week, the number of postings, and the general sentiment,” Isabelle points out. “Red means negative, green means positive, and blue means neutral.” Over the span of one year, they have 71 million posts on Syria; 30 million in English, and 40 million machine-translated from Arabic.
Scheidl and Isabelle are adamant that although the system itself can provide mountains of useful information, it still requires a skilled analyst in order to make sense of it all.
To prove the point, they search for all documents that mention both Syria and Chlorine attacks, for which there are 12,000. “Here you can see a big peak,” Scheidl says, pointing out the mountain on the timeline. He then narrows down his search to include only the peak, and the posts are further reduced to 4,000. It’s safe to say that this “peak” represents a significant event, and each of those 4,000 tweets will be annotated with a certain emotion. If this were a real life scenario, and I was an intelligence analyst, I could use the system’s entity recognition to narrow down actors and locations involved in the event, possibly generating some solid leads on who may have been responsible.
Unfortunately — and this is where a human brain comes in handy — sometimes emotions tied to certain events can be deceiving. “You’ll get some that are expressing joy at the negative effects of the attack, and you’ll get some that are celebrating the heroism of the lives of the victims, which will register positively,” Scheidl explains. “You’d need a good analyst.”
Isabelle backs him up. “Yes, we’re not trying to replace an analyst, by far. We just provide more tools.”
And although the NRC-Thales-MediaMiser team have unquestionably accomplished that, Scheidl tells me that they’re not finished. “This project is ambitious, but it was even more ambitious than you see here.” He says. “We had other NRC technologies that were in the pipeline to be implemented.”
Aberration and trend detection was one of those things. If an intelligence analyst is interested in neo-Nazis, but they haven’t been overly active online, the system will automatically be able to monitor the activity, alerting the analyst when there is a peak.
Although the NRC set out to prove that their natural language technologies could be applied to the defence and security sector, they’ve also dredged up some interesting questions about human behaviour, and the true power of big data analytics. If a program can monitor hundreds of millions of tweets at a time, all over the world, watching trends appear and disappear, will the analyst sitting in front of it soon be able to predict the future? Can they do that now? Can pandemics be detected earlier, or can the population impacts of disasters be addressed faster? Will the military be able to thwart attacks before they start?
As Richard and I leave building M-50, I have a feeling that Scheidl and Isabelle would probably answer yes to that question. The technology that they have proven will make a dramatic impact on defence and security in Canada for all the right reasons, but like any weapon, much care will have to be taken to ensure that it does not fall into the wrong hands.
As Einstein said, information is not knowledge. But Scheidl and Isabelle are getting very close to automating that conversion with sophisticated software. Today, they need a good analyst. Tomorrow, I’m not so sure.