The "data", they do speak! We are not listening!

Let's begin with the conclusion, which is that despite the already existing and broad experience in artificial intelligence and machine learning applications, we still have a long way to go. We have to learn to use the relevant technologies and to accept the "insecurity" of the proposals of computers towards humans. See the following three suggestions that have emerged from Big Data processing:

Globally, 4.3% of children die before reaching the age of 15. That's 5.9 million children who die every year. So the World is "a mess".
In the past, about 50% of children died. Today the percentage of children who will die has dropped to 4.3% . So the World is "better" today than before.
In the European Union, 0.45% of all children will die. So the World can still be "much better".

(source: Ourworldindata).

All three of the above statements - findings of Big Data are correct. It is up to us which use we want to make of the result. We suggest we choose all three if we should take some action. Which one would you choose?

Artificial Intelligence is not new. For at least two decades now, we have had applications where Computers perform the duties of Humans. Whether deciding on actions (see Stock Market Trading), or assisting in decision making (see IBM Watson in cancer treatment), or finally revealing hidden trends and phenomena (see consumer or citizen behavior).

Two issues are worth mentioning: first, that we sometimes see the results of Big Data analyses with distrust and second, that managing AI infrastructures is an expensive business.

The "data" speak and give useful results. But we need to learn to use the relevant technologies, adopt methods to manage false results and accept the need for continuous investment in Machine Learning and AI.

So, we are suspicious and hesitant to accept the findings of Big data analyses. This is because the predictions are not certain, they are not 100% or at least 95%. In statistical science we have been taught to create and use models whose accuracy tends to 95%, i.e. we have a statistical error of 5% or a little higher. In artificial intelligence, the accuracy falls to 80% or even less. In other words, we are asked to make decisions based on models that have a relatively high probability of being wrong. Of course we accept that algorithms must be constantly updated to improve the accuracy rate, but the existence of false positives and false negatives will always remain a fact of life. In the business world, 20% chance of error in a decision is high. For this reason, managers are often reluctant to accept it and are disappointed to reject the recommendations of the computer models. The solution, however, is not frustration and avoidance but the adoption of methods to manage false results.

Let's take the example of investigating Customer Sentiment, using Natural Language Processing, through the comments they post on social networks. After Big Data Analytics processing, the Probability of false can even reach 25%. That is, to consider a sentiment positive, when in fact it is negative, and vice versa. With a 25% probability of being wrong, it is very difficult for a plan of actions to improve the quality of service to Customers to be accepted. Despite this, the "blunt" question arises: is it better to do "nothing", or to do "something" with 75% data accuracy? Management usually answers this question depending on their appetite for risk-taking and a visionary approach to new tools.

Another issue is the level of investment in Big Data and AI. Administrations often think that for some spending in euros or dollars, they can have AI applications. At the same time their expectations are skyrocketing, expecting miracles from AI applications. This is by and large a great misconception. The bigger an expectation and vision, the more capital it (will) require. Also, AI requires ongoing investment. The case of "one-and-done" spending and then the system will run by itself simply does not exist.

One example - admittedly large-scale - worth mentioning in this direction is the IBM Watson system[1], which has been used for many years to treat various forms of cancer. This system reads millions of data from thousands of databases, medical research, encyclopedias, articles, written medical reports, etc. It has cost many billions of dollars and has gone through a lot to reach the form it has today. Despite its undoubted contribution to the anti-cancer effort, IBM was forced a few years ago to cut its budget because it had reached unmanageable heights that even they could not afford. And the quality of the system's decisions and recommendations had begun to decline, mainly because doctors and researchers, as data providers to the system, had become complacent about the burden and cost of maintaining the system. In short, funds were lacking, both to the owners of the algorithms and to the users of the algorithms. Much the same thing happens, perhaps on a smaller scale, in thousands of other applications, mostly commercial, which are treated in a spirit of capital constraint.

It seems that we have a long way to go. In no way we can speak today of the dominance of computers and their findings. The quality of the results depends of course on the investment made, but also of the judgement exercised in the procedure and processing of the data provided by the models.

[1] https://www.nytimes.com/2021/07/16/technology/what-happened-ibm-watson.html

The "data", they do speak! We are not listening!

Recent Posts

Comments