IBM BSRE 2013:

New insight in the future of IT

The US has been monitoring enormous amounts of data through the PRISM program, while the Norwegian company Coop sends so customer specific marketing offers that people complain. Have you ever wondered how they manage to analyze and act on such large amounts of data?

Publisert Sist oppdatert
Torbjørn Pedersen is a PhD student in the Intelligent Drilling project at the Norwegian University of Science and Technology (NTNU). He is also the president of the interest organization for doctoral candidates at NTNU (DION). His field of study includes: Automation, Control Systems, System Identification, Modeling, and Simulation.

A visit to IBM gave me new insight into the large emerging market of big data analysis.

Between the 1st and 3rd of July, 75 students from all over Europe and the Middle East attended this year’s Best Student Recognition Event (BSRE) at IBM Montpellier. I was one of two students going from the Norwegian University of Science and Technology (NTNU), and one of four students going from Norway. The event consisted of three days packed with presentations, tours, food, and fun.

However, most interesting for me was IBM’s predictions on how our (near) future will change due to advances in information technology. IBM has for the last 20 years been granted the highest amount of patents in the US, so they should know what they are talking about.

During BSRE 2013 there was a lot of focus on what IBM considers game-changing technologies, including automatic data analysis, self-learning systems, cognitive computing, cloud services, big data, and mobile computing. IBM is working on what needs to be done to facilitate these advances (e.g. better storage, more efficient computing), and how the products should be used by the customers (e.g. smart cities and smart industry initiatives). It was interesting to see how IBM is now branding itself mostly as a pure software and service company, rather than a computer company.

One of IBM’s key focus areas is analysis of unstructured data. Unstructured data are data which are not in a conventional database format or another structured format. Some examples are raw text (logs, PDFs, reports, etc.), images, sound, and video.

IBM is already offering software capable of finding and analyzing unstructured data, building context between the pieces of information, rating the quality of the sources, and reporting on selected performance criteria. This information may also be coupled with predictive and statistical models, which can be improved as the systems learn. You may be familiar with the IBM Watson project, where a computer plays Jeopardy, and usually wins against champion human players.

So, what does this mean for you? It means that it is already possible for companies to analyze all their combined data records on you and all their other customers. This makes it possible to look for patterns related to what makes a customer cancel his cellphone subscription, identify high value customers, predict when someone will look for a new product, and much more.

For example, a telephone service provider can analyze how much you use your cellphone, how many times this week you had a dropped connection, how old your cellphone is, if you have called customer support, how many friends you have (based on call logs), and predict what the business impact will be if you (and possibly your friends) leave the company. All this information can then be coupled with what other customers have done in similar circumstances. If you are a good customer, they may take pre-emptive measures, like offering you a reduced price on your subscription, before you actually leave.

It is also possible to analyze maintenance and production data to reduce product and service costs. A car producer can predict that if five cars produced in a given factory, in the same production line, fail due to a common error, then other cars will probably also fail. One can then do pre-emptive maintenance on these cars before this happens.

The public sector can use the data analysis tools to analyze traffic systems, crime, effectiveness of medical treatments, and to plan maintenance of electrical and water systems, leading to better resource utilization and pin-pointing of problem areas.

However, the advances of data analysis also have some potential troublesome consequences. The same systems can be used to look for which customers are most likely to buy a product even if they do not need it, or be used to screen out insurance customers or job applicants who are likely to get sick. One should be aware that there is (almost) no limit on how much data can be analyzed, and that the costs of doing the analyses are dropping rapidly.

So, two increasingly important questions are: who has which data regarding me, and who has access to that data tomorrow?

A bank, which is often also an insurance company, probably has data on which months you have gotten reduced payments and from whom. They now have tools to automatically search all your financial data, loaning applications, even data from your family (if they share the same bank), and couple this with insurance data to look for patterns in groups of customers which are likely to be a high cost (or high revenue), due to for example sickness, in the future. One should not forget that knowledge (data) is power and money, more now than ever before.

Real-time data analysis is also an important part of modern security solutions. We need to detect if any users and computers connected to our network, directly or indirectly, are a threat to the system. Did you know that IBM is the world’s 3rd largest data crawler, cataloguing dark-net computers (e.g. bot-net machines)? The main purpose is to detect breaches of security, by for example detecting communication with suspicious computers, or usage of computers at odd hours or when the user is supposed to be in meetings. These systems can also detect attacks from insiders, by analyzing if anyone is acting in a suspicious way, for example by opening a lot of files not related to their job description. Such analysis could (maybe) have caught Edward Snowden, before he was able to copy out the same amount of confidential information.

It could even be used to screen for unwanted behavior, e.g. by analyzing data logs and traffic to outside computers, to ensure that employees do not use a lot of time on reading web comics, or are slandering colleagues on the chat messaging system.

Understanding all your data, and the relationship to other data, is important for all businesses and governments. As the amount of information continuously increases, we need good automatic tools for doing the job. This will lead to more efficient resource usage, customized products, customized marketing, lower costs, and provide new and improved services. However, this may come at the price of more surveillance, more inequality, and less privacy.

It remains to see how large the impact of data analysis and predictive models will be. It has been said that “once we know the number one, we believe that we know the number two, because one plus one equals two. We forget that first we must know the meaning of plus”. Equally it may be very hard to discard irrelevant data, but IBM and others are trying.

If you get a chance to travel to one of these events, I would strongly encourage you to take it. You meet a lot of interesting people, and gain insight about how the business world is evolving.