Wednesday 5 March 2014

Big Data, Big Controversy

Big Data, what is it, why is it useful, and how can it be controversial? Let's start with the "what". Big Data is a very large amount of formatted or unformatted data, usually collected by an automated system, although not always. It is usually such a large amount of data that it is almost impossible to store or process in one location or on one server. The processing of Big Data is usually handled by methods such as data mining to extrapolate information or statistics from the data; in other cases it can be processed in parallel using multiple privately owned servers, although this is very expensive and is usually only done if reliability of response speed or privacy of data is a priority.

Next comes the "why"; why is Big Data useful? Big Data is often used by large organisations to extrapolate information that is of great importance to their purpose, service, or product. For example Big Data is collected by mobile companies about their customers' phone calls, from which they can extrapolate usage statistics and information like the most frequently used areas. Form this they can find out which areas need to be upgraded to handle a higher call bandwidth and how best to manage their network traffic. Search engines also use Big Data to calculate their search results from databases of URLs and tags, etc. although they usually use their own servers to store and process the data so that search times can be more reliable. Big data can even be used to figure out new cures and treatments for diseases and improve the quality of medical care as is being proposed by the NHS in the for of Care.Data, but we'll get to that later.

Finally we come to the "how"; how can Big Data be controversial? Big Data is often as anonymous as possible. However the problem is that there are so many sources of Big Data out there. Although each set of Big Data individually does not individually identify you, it does include a lot of data that can be cross-referenced with other Big Data to build a profile. This can then be used to narrow down who the data can belong to and eventually can even narrow it down to a single person. This could be used for beneficial purposes, but at the same time it could be used just as easily for malicious purposes like aggressive advertising or identity theft.

One of the most controversial cases of Big Data use is the proposed plan know as Care.Data. The NHS plans to sell anonymized medical records at £1 each. The idea behind this is that the Big Data from the combined medical records can be analysed to easily find ways to improve medical services, as well extrapolate correlations relating to conditions and illnesses that can be used to guide and improve development of new treatments. While it is true that the data could be incredibly beneficial to the medical world, it has also been pointed out that there is little that prevents almost anyone from walking up and buying this data to do with as they please. As mentioned before this could lead to people being matched back to their medical records using large-scale Big Data processing from multiple sources. The extrapolated information could then possibly be used to manipulate elderly and vulnerable people (for example those whose medical records show that they have learning difficulties or dementia) by focusing telemarketing or scams on them. It could also be used to increase insurance costs for people considered "high risk" due to medical conditions.

Personally I think that Care.Data could do a lot more good than harm. But I also believe that there need to be stricter regulations on who can buy the data and for what purpose. Care.Data is just a great idea that needs to be reworked to prevent it from being executed poorly, which is unfortunately a far too common occurrence these days.

Sources and related links:

No comments:

Post a Comment