Liam Robinson's Internet Technology Blog: February 2014

Wednesday, 26 February 2014

OpenStreetMap: The Wikipedia of Street Maps

A few weeks ago I blogged about crowdsourcing, giving possibly the best example: Wikipedia. Despite the fact that Wikipedia shouldn't work, at least in theory, due to the probability of errors and the likelihood of vandalism it has still managed to become incredibly successful as well as surprisingly accurate due to its moderation system that prevents errors and vandalism from becoming problems. Clearly heavy moderation of content is a very effective way of making crowd sourcing viable as there is another great crowdsourced project known as OpenStreetMap. It also seems to be succeeding much in the same way as Wikipedia by using similar methods of moderation.

OpenStreetMap uses crowdsourcing to map areas and uses an Open Database License, meaning that anyone is allowed to use the data in the OpenStreetMap database so long as they are not charging for it. It allows anyone to map areas, edit the information of roads and buildings, and add new buildings and roads. This has allowed the project's map of the world to grow very quickly and it has even received some data that is not on some of the bigger companies' maps like Google maps. This is due to the fact that local people can map an area in detail, even down to the extent of labelling different building sections individually with full descriptions. This has also made OpenStreetMap rather popular with small businesses as they don't have to pay to have their business specifically added to a map by the company that owns the map and they don't have to jump through hoops to get anything changed or any details added. Instead they can manually add it themselves.

An example of OpenStreetMap vandalism

Unfortunately this also means that other users can abuse the system and change whatever they want whenever they want and mess with the map. On one hand if the system is not abused then the users can easily contribute to the mapping project as originally intended, by adding useful and relevant data about houses, roads, and shops. However the main problem with the system is that people may also purposely vandalise the map by deleting landmarks or roads, renaming things to completely different places, or adding non-existent roads and buildings all over the place to confuse people. They could also do something relatively harmless like change their house's name to "Magical Wizard Castle". Another problem with the system is that non-intentional mistakes can ruin the map as well like typing an address as "123 Johnson Road" when you meant to type "23 Johnson Road" which can lead to people being unable to find specific places, or accidentally adding a road out of sync with the map due to lag or clumsiness.

Fortunately the OpenStreetMap system is made to cope with all of the possible vandalism and accidental misinformation that appears. One thing that helps a lot against this is the fact that the majority of the OpenStreetMap community are actually fairly responsible and will often seek out and correct any mistakes or vandalism that they see without any moderator or administrator activity necessary. Sometimes however the vandalism or mistakes are not seen by the community. When this happens moderators step in by checking recent changes and looking for flags from vandalism auto detection algorithms (in other words it tells them if someone makes a lot of changes or names something a vulgar word); the moderators then correct the any wrong changes made either manually, or by backdating to an earlier backup of the area.

Overall this system is pretty effective at preventing vandalism from becoming a problem as even large scale vandalism that goes undetected by the automatic systems can be reported by users and is usually dealt with by a moderator within 48 hours. That being said it is still not always as accurate or as detailed in some cases as other maps like Google maps. Although in a few specific cases it does have more information for certain areas, so depending on what you are looking at it can vary in accuracy and information depth. Despite the risk of getting inaccurate or vandalised content that comes with using crowd sourcing I am quite interested to see what OpenStreetMap will evolve into with time. If 3D modes and street views can be implemented then it could become a great Open License alternative for companies that have to currently pay to use Google maps' data, and a little bit of healthy competition for the big companies is always good for the consumer, so with any luck OpenStreetMap will become a force to be reckoned with. Look out Google! Oh and there's Bing Maps... but no one cares about Bing.

Sources and related links:
https://www.openstreetmap.org/
http://wiki.openstreetmap.org/wiki/Open_Database_License
http://wiki.openstreetmap.org/wiki/Vandalism

Internet Speeds: ISPs Being Sneaky

So these days internet speeds are getting pretty big and people are willing to pay more and more to get better and faster speeds. ISPs (those guys that connect you to the internet) like Virgin Media or BT offer speeds these days that are more than enough for a full family to stream HD videos from the internet all at once, at least in theory. However unfortunately for us there are many ways that ISPs are misleading the majority of us into thinking that we are getting far more than than we actually are.

The first and probably the most recognisable of these ways that we can be mislead is the way that ISPs sell you bandwidth. They sell you bandwidth speed for your internet connection in megabits per second, otherwise known as Mbps or Mb/s. Nothing wrong with that right? Unless you consider that a very similar term is used is megabytes per second also known as MBps or MB/s (notice that the capital B is the only difference in the abbreviations). Those familiar with computers will know that there are eight bits in one byte, meaning that one Mb/s is only an eighth of the speed of one MB/s. So when you pay for an internet connection in Mb/s the ISPs make it easy to mistakenly think you are getting far more than you really are. It also doesn't help that some adverts show the connection as "50MB/s" when they really mean "50Mb/s", although this is mostly is due to the people who actually make the adverts not knowing the difference between them it is still unacceptable really.

Another thing about the speeds that they quote is that they use very careful wording when quoting speeds, like in the image above. They say "Up to 60Mb", meaning that they aren't legally obliged to actually provide 60Mb/s; in fact they don't even have to give you 30Mb/s even if you're on the 120Mb package! In reality most people get only about half of what is offered as the "Up to" speed on average, so if you're on the 30Mb/s package you can expect roughly 15Mb/s, if you're on the 60Mb/s package you can expect roughly 30Mb/s, etc.

Probably the least known about problem with internet speeds is that if you want to connect to a website or service, e.g. Netflix, and the website or video lags it may not actually be your connection or the website's connection that is actually the problem. You see your ISP connects all of the customers on its network to the internet via a much larger ISP which they purchase bandwidth from in bulk. The larger ISP then routes all of the traffic of the networks connected to it through connections to other larger ISP networks. The trouble is that the large ISPs have to pay each other to build better connections between them so that a customer whose traffic is routed through one ISP can get better bandwidth to a website whose traffic is routed through another ISP. This means that if your ISP doesn't pay the bigger ISPs to build a better connection then the traffic between the networks gets congested and you get a slow connection to websites and people on the other ISP's network but you still get a fast connection to those on your ISP's network.

Fortunately there is actually a work-around for this. You see if your ISP's ISP has a good connection to another bigger ISP that in turn has a good connection to the ISP of the website you want to get to then you can actually use a VPN (Virtual Private Network) to get around your slow route and use the faster one of the other bigger ISP instead. If you connect to the VPN you can ask the VPN to fetch the data that you want from the website using the non-congested and fast route of the VPN's ISP. Then the VPN can send you the data through the fast and uncongested route between you and them. The end result is that you bypass the slow and congested route that you would normally have to take (even if it is actually shorter) and instead can get a faster connection.

Personally I would like it if ISPs would just cut the crap and just give me some realistic figures for the speeds I'm likely to get. But somehow I don't think that'll happen any time soon and yelling at the poor people in the call centres isn't likely to do any good. So I suppose for now I suppose I'll have to settle for a look of general disapproval in the ISPs' direction and hope that one day some new company will come into the mix with decent customer service to slap them in the face with the metaphorical hand of healthy competition.

Also, apologies on the late posts, my internet went down for roughly 5 days last week so I couldn't do any research or fact-checking until recently. I guess Virgin Media must have some angry psychics that didn't want this post to get written and posted.

If you would like more information on this week's subject watch this: http://www.youtube.com/watch?v=NWn_BEZYpfA
Around 22:40 in the video they begin talking fairly in-depth about the ISP to ISP problem and I found it a really interesting resource for writing this post.

References and related links:
http://store.virginmedia.com/broadband/compare-broadband/50mb.html

Monday, 10 February 2014

Crowdsourcing: A Good Bad Idea?

The race for information on the internet is becoming more and more hectic as time progresses. This has led to crowdsourcing becoming one of the most used methods of data collection on the internet. The main reason for this is that it allows a lot of data to be collected very quickly. The trouble with this is that the data isn't always reliable.

A great example of this is Wikipedia. Wikipedia, if you somehow didn't know already, is an encyclopedia that uses crowdsourcing to get information for its articles. Anyone can write a Wikipedia article and anyone can edit it. This has allowed Wikipedia to quickly become one of the largest sources of information in the world as it has such a large number of people contributing towards it. However this has also compromised the reliability of Wikipedia's information. Due to the fact that anyone can provide or change the contents of Wikipedia's articles it has made it easy for people to vandalize the content, in some cases deleting it entirely. In other cases it has caused arguments between users which has lead to them editing articles back and fourth to be how they wanted it, leading to inconsistencies in the article and fluctuations from day to day.

So the main advantage of crowdsourcing is that it allows anyone to contribute. Unfortunately the main disadvantage of crowdsourcing is also that it allows anyone to contribute. How is this remedied? How has Wikipedia become so successful? The answer is a combination of moderation and data backups that results in the crowdsourced system being incredibly fault-tolerant. By using moderators to regulate what users can and cannot do, and by allowing the moderators to reverse changes made by users, the number of people that have to be trusted in order for the system to work is reduced greatly. The number of people that have to be trusted in order for the system to work is then reduced further by implementing a hierarchy of higher level moderators all the way up to an administrator.

Even with measures in place to prevent the chaos that is the internet from ruining a crowdsourced project some things still manage to slip through the metaphorical net of reports generated by users, and moderators. Although for the most part the system works and is generally far more efficient than any other system out there; some studies even show that Wikipedia is on-par in terms of accuracy with other encyclopedias like the Encyclopedia Britannica. That being said crowdsourcing still requires a lot of resources, like moderators and powerful servers to receive input from so many users at once.

Personally I think that crowdsourcing is the way forward. It allows large amounts of data to be collected quickly and at a relatively low cost compared to other methods of data collection. Admittedly some of that data may be rubbish provided by people who just want to watch the world burn, but so long as a responsible person filters the data at the end of the day, it's not half bad.

Sources:
http://news.bbc.co.uk/1/hi/technology/4530930.stm
http://www.ibtimes.com/wikipedia-study-says-its-accurate-280135
http://en.wikipedia.org/wiki/Main_Page

Tuesday, 4 February 2014

Anonymity and Privacy On The Internet

These days on the internet it has become almost impossible to remain anonymous. Almost every site you visit will want to track you in some way or another. This is usually for advertising purposes and is most commonly done by using cookies. Most of the time they just store your approximate location, a list of visited sites, and a list of keywords. These are then used to show you targeted advertisements that you are more likely to be interested in. Not particularly sinister right? However when you consider the implications of this it is rather disturbing. It is the internet equivalent of them opening up your mail and having a rummage for anything interesting. The worst part is that that this is only roughly the limits of the "legitimate" companies tracking you and looking at your data. There are also websites out there that can place things called supercookies on your computer (unfortunately some "legitimate" companies also do this although it is rare).

Supercookies are cookies that are really difficult or in some cases near downright impossible to get rid of. With regular cookies you have the option to clear them from your computer so that they will stop collecting data about you; supercookies, on the other hand, spread themselves to every place on your computer that they possibly can. If you try to delete the supercookie and it exists anywhere else on your machine then the second the remaining copy of the supercookie is activated it will restore every copy that you had previously deleted. It's like a cancerous tumour: unless you get rid of it all it will just grow back.

Another method that websites can use to track you as an individual is your browser. You may use a very common browser like chrome or firefox, but there are many different ways they can narrow down who you are by the parts of your browser and system. Use any plugins? Have any custom fonts installed? They can check to see how many people have that exact combination of plugins, system fonts, browser, and many other things. The end result is that your fairly unique browser setup can be used to track you as a unique user, and even if your browser is fairly generic they can still check your system fonts, operating system version, etc. there's almost no escaping it. This tracking method means that they can even track you through proxies.

So now comes the big question: Is it actually possible to be truly and absolutely anonymous on the internet? The short answer is no. There is no 100% certain way of guaranteeing that you cannot be tracked on the internet in some way, shape, or form short of cutting yourself off from 99% of the features of all websites (in which case you may as well not be using the internet at all). However there are ways to make it very difficult (and in some cases logistically unviable) for websites and people to track you and trick them into receiving at least some false data. First of all you should disable things like flash and java; if you need them then set them to run only on demand, like if you need to use a website feature that requires flash you could manually unblock that flash element on the page without unblocking any others. Secondly you can use a VPN (Virtual Private Network) or a proxy like Tor to mask your IP address and make websites think that it is different; although this doesn't always work if the site asks your pc to provide its IP address using something like Java. Finally another thing that you can do to feed websites false information instead of yours is to run your web browser off a virtual machine. This way you can have all of your custom fonts, browsers, etc. on your computer but it will only report those on the VM; it will also report your operating system as being that of the VM. There are also other methods of reducing the amount you are tracked like using plugins to force https on all sites and filter out specific cookies, etc.

These methods on their own may not stop someone if they are already specifically trying to track you, however amongst the millions of people on the internet these methods make it pretty hard to identify you. Our greatest defence against tracking is numbers (especially when it comes to multiple people connecting through the same VPN). There are too many people on the internet to all be tracked individually and so it is usually easy to hide within the crowd by making yourself harder to track than the others. This makes it logistically inefficient to track you and gather any specific personal data as they could probably track 3 other people before they could find your IP through a proxy like Tor if they could find it at all. If nothing else you can inconvenience the companies that try to track you and increase the costs for the processing power required for their nosiness.

Personally I believe that anonymity on the web to a certain extent is a good thing. There are many who believe that anonymity can be used for illegal purposes and so a breach of personal privacy is justified. I find it odd that just because it's on the internet a breach of personal privacy is seen as rational and acceptable. We wouldn't want people watching over our shoulders constantly in real life as we read magazines, read our mail, or have a conversation with a friend. What makes people think that we would be ok with it just because we are on the internet?

Sources and related links:
https://abine.com/people_guide.php
http://www.extremetech.com/computing/168418-microsoft-google-working-on-super-cookies-to-track-your-behavior-everywhere
http://weknowmemes.com/tag/but-as-a-bundle-we-form-a-mighty-faggot/