Anton Cohen

Sunday, October 23, 2016

Python generators with multiple threads

Sometimes when writing Python there is a CPU expensive task that needs to be done for every item in an iterator. You might go to Google and search for python multithreading, which might take you to the threading module. But that isn't very useful, so next you look to the multiprocessing module. That looks more promising, but all the examples have managers or queues or pipes or some other complicated stuff.

I want simple. All I want to do is write a generator that yields something, have that something passed to a thread or process to compute the result, then have a generator yield the result when it is ready. Sounds simple. Unfortunately is is hard to find examples of how to do this in Python.

So here are some examples!

In the example below increasing_sleeps() is the function that yields something and rand_sleep() is the function that takes a long time to process each thing yielded by increasing_sleeps(). It uses multiprocessing.dummy to use threads instead of full processes.

The results will look something like this:

Slept 0 seconds
Slept 0 seconds
Slept 1 seconds
Slept 1 seconds
Slept 4 seconds
Slept 4 seconds
Slept 3 seconds
Slept 5 seconds
Slept 1 seconds
Slept 4 seconds

Cool! But what if I need to know what input was sent to rand_sleep() so I can correlate the results with the input? Simple, just yield the input from rand_sleep(). The example below does that correlation, and also uses full Python processes instead of threads. Threads are cheaper, but there are cases where you will need to use processes to get around the Global Interpreter Lock (GIL). One big gotcha with using full processes is that Python needs to pickle the function being sent to the new process (it doesn't share memory like with threads), and it can't pickle functions or instance methods. Oddly enough it can pickle instances of custom classes, so you can make a callable class and pass that instead of a function.

The results will look something like this:

Slept 0 seconds of max 0
Slept 1 seconds of max 1
Slept 1 seconds of max 2
Slept 0 seconds of max 3
Slept 3 seconds of max 5
Slept 0 seconds of max 6
Slept 4 seconds of max 4
Slept 2 seconds of max 8
Slept 5 seconds of max 7
Slept 8 seconds of max 9

Nice! That makes it clear that the results are coming back out of order. I you want the results to comeback in order, use Pool.imap() instead of Pool.imap_unordered().

And finally, here is an example of a simple yet useful use of these methods:

Tuesday, June 4, 2013

Role-based Puppet with Hiera

This is a method of configuring machines with Puppet based on their roles. It's similar to Jordan Sissel's nodeless Puppet in that there are no individual node declarations, just a default node that assigns one class. That one class is responsible for including all the other classes based on what 'roles' are assigned to the node. One difference between this role-based puppet and Jordan's nodeless puppet is that I'm not using an ENC. Instead of an ENC I'm using Hiera to assign to roles and specify custom parameters for the nodes. I put together a demo of role-based puppet as a Vagrant configuration, which you can get on GitHub.

Here is what the default node looks like:

The role class searches Hiera for 'roles' and includes them:

In Hiera multiple roles can be assigned, and data can be added to customize the roles. In this case we are adding the 'role::lb' load balancer role along with 'role::base', and specifying how much memory memcached should use and how many nginx processes should run.

The roles then include the actual classes that do the work, in this case nginx and memcache. In the real world the role would probably do a lot more, like configure sysctl. The role can pass site specific parameters to the classes, but it can also rely on Puppet to lookup parameters in Hiera.

Site-specific roles and generic modules

Every site is different, but there is no reason that every company should have its own nginx module. Modules should be generic and customizable, they shouldn't make assumptions about how you want to configure something. Roles are site-specific, they bring together modules in the way your site uses them. Data in Hiera can then customize modules or roles in node-specific ways.

For example puppetlabs-nginx (and jfryman's nginx module that it's based on) configures a yum repo hosted on nginx.org, and they don't allow you to specify custom templates for configuration files. That means I would have to fork and customize the puppet module if I wanted to use my own yum repo with a version of nginx compiled with ngx_pagespeed. It shouldn't be that way. I forked puppetlabs-nginx and added parameters to use custom templates, I'm going to continue working on it to make it more generic while still being easy for the 90% use case. I would like Puppet to be as simple as Ansible for the easy cases, and still be efficient and powerful when you need to customize.

Monday, June 3, 2013

How to fix Droid 4 random shut off

Ever since I got my Motorola Droid 4 it has randomly turned itself off multiple times a day. It's not a software crash, it's not the power button getting stuck down or bumped in my pocket. It's the stupid battery! The Droid 4 has a fixed battery that you can't remove, yet somehow they made it so the battery could easily lose its connection.

To test this I removed the back cover, you need to stick a pin in the hole at the top right of the back to remove the cover. With the cover off I poked at the battery, I found that if I pressed the top left of the battery it would cause the bottom right of the battery to lift up and lose its connection, and the phone would shut off.

To fix this I folded up some paper and taped it to the bottom of the battery and replaced the back cover. My phone hasn't powered off in days!

If you try this you might want to pick some materials that aren't flammable, I just used what I had lying around.

Saturday, May 25, 2013

GoDaddy's crippled NetApp filers

I just finished helping my friend Adam Aragon with an issue on his AntiPretty [NSFW-ish] alternative modeling site. It's a WordPress site running on GoDaddy. He was getting an error every time he tried to upload new images. Directory permissions were fine, as was all the PHP and WordPress stuff. GoDaddy support couldn't find a problem. So I logged in with SSH and tried to 'touch' a new file in the directory.

$ touch /path/wp-content/uploads/2013/05/testfile1
touch: cannot touch `/path/wp-content/uploads/2013/05/testfile1': File too large

Phishing, a failure of OAuth

Someone I know was recently the victim of a phishing attack. They found out when a ton of their contact sent emails asking "what's this link you sent me?". It turns out they sent out an email that said:

"You might be interested in this properties Click Here to view listing"

The email went out to a lot of people, possibly everyone in the address book. So how did this happen?

The day before this happened my friend received the same email, with the same link, from a friend of his. The email looked legit, it had his friends full signature, with phone number and street address. And hell, he's in the housing industry, so looking at properties for sale isn't weird. So my friend clinked the link. He was presented with a page that looks like this:

It looks pretty normal. Lots of sites ask users to log-in with credentials. My friend dutifully provided his Gmail credentials, and was promptly redirected to remax.com. It all seemed pretty normal.

And there we have a failing of OAuth. My friend knew he should never provide his password to a third-party site. But the prevalence of sites asking users to "Log-in with Facebook" or "Log-in with Google" has conditioned users to think it's normal to provide give third-party sites your password. Of course OAuth doesn't send third-party sites with your password. With real OAuth the third-party site should redirect you to your authentication provider, for example it should say google.com in your address bar, once you authenticate google.com they will redirect you back to the third-party site, and provide that site with a special token.

The problem is users don't know that the URL bar should say google.com. Worse yet, on mobile devices the log-in pages will often be in a WebView without a URL bar, and even if you could see the URL there is no way to trust it. Even if users know the URL bar should say google.com, they have been so conditioned to OAuth that they are likely to just go through the process without thinking to look at the URL.

So that's a problem. OAuth is better than providing your password to a third-party, and having a few main authentication providers is better than the password proliferation of having a password for every site. But it's still a problem.

What's the solution? The obvious one is to look at the address bar. But that isn't good enough. Sometimes the address bar won't be there. More than that, it's way too easy to get lulled into a sense of normality and fall for a phishing scam.

Two-Factor Auth

As much as passwords suck, we aren't getting rid of them anytime soon. Right now the best solution to reduce the impact of phishing attacks is Two-Factor Authentication. As the name suggests 2-factor auth requires two forms of authentication. It's like being asked for two forms of picture ID. Usually you want the two froms of ID to be different in a way it would be hard for someone else to get both, for example something you know and something you have. The most common form of 2-factor auth asks for a username and password (something you know) and a time-limited code that comes from a physical device (something you have).

The best example of two-factor authentication is Google (Gmail, Google Apps, etc.), explained in pretty pictures here, and here. When you log into Google it will ask for your username and password, then it asks for a special code. The code can be sent to your phone via SMS, come from a voice call (Google calls you), or from a mobile app called Google Authenticator, which is an open source app that any site can use. Google Authenticator is a great implementation of two-factor auth tokens, I wish more sites used it or similar software instead of text messages, right now it can be used with Google, AWS, Facebook, Dropbox, Microsoft, and any others that support TOTP.

There are some difficulties that come with 2-factor auth. Having to enter the code can be annoying, but Google allows you to remember the computer you are on, so you only have to enter the code every 30 days. Not having your phone is another issues, Google allows you to print special "backup codes" for those cases. Then there is the issue that not everything that requires your password knows how to use 2-factor auth. For example if you are using Mail.app on your Mac to check Gmail, you'll run into a problem because Mail.app doesn't know how to ask you for the special code. There is a solution, Google allows you to generate application-specific passwords, these are special passwords designed to be entered into things like email and chat clients. They are long and complex passwords that you are not supposed to remembered, instead you have the application remember them.

Specifics of this phishing attack

The link in the email to shortened with bit.ly. The phishing page didn't appear to have any attack code, there was very minimal JavaScript. It pretended to be an OAuth login page to access Remax. When a user entered their username and password the data was POSTed to a PHP script, which redirect to www.remax.com. To a lot of users it probably seemed very normal, especially if they were is the real estate business.

One interesting thing about the spam emails is that they were not sent with SMTP, which would be the easy way to send through Gmail and the like. Instead they were actually sent through the web interface. Presumably there wasn't a human clicking around, and instead it was automated, probably on a botnet. Though oddly enough one of the spam messages I saw was sent from a computer hidden behind Hide My Ass. The reason sending though the web interface is significant is twofold. Firstly in means the hacked account's email signature will be used. Emails look much more legitimate when it comes for a friend or business associate and their normal signature is used. The other significate thing is it raises the sending limit. With Google Apps there is a 99 recipient limit per message when sending through SMTP, but it's 500 through the web interface.

Report Phishing

If you come across a phishing site, please report it. You can report phishing sites to Google's Safe Browsing service. Google's service is used by Chrome, Firefox, and Safari to block malicious sites. It's also used by services like bit.ly to block shortened links to malicious site. I reported the the phishing site to Safe Browsing and it with promptly blocked by Chrome and bit.ly.

Stay safe on the web, know what to phishing sites look like, and consider using Two-Factor Authentication for sensitive or high profile sites.

Tuesday, March 5, 2013

Apple Mac computers don't cost more than PCs

If you talk to people about buying a new laptop or read comments online you will see a consistent meme -- Macs cost way more than PCs. Lets look into that.

Base Prices

Laptop	OS	CPU	RAM	Drive	Display	Weight	Price
Dell XPS 13	Windows 8	Core i5 1.7GHz (2.6GHz max)	4GB	128 GB SSD	13.3" 1366x786	2.99 lbs	$999.99
Apple MacBook Air 13	Mac OS X 10.8	Core i5 1.8GHz (2.8GHz max)	4GB	128 GB SSD	13.3" 1440x900	2.96 lbs	$1,199.00
Lenovo ThinkPad X1 Carbon	Windows 8	Core i5 1.7GHz (2.6GHz max)	4GB	128 GB SSD	14" 1600x900	2.99 lbs	$1,249.00
Google Chromebook Pixel	Chrome OS	Core i5 1.8GHz (2.8GHz max)	4GB	32GB SSD	12.85" 2560x1700	3.35 lbs	$1,299.00
HP Envy Spectre 14	Windows 8	Core i5 1.7GHz (2.6GHz max)	4GB	128 GB SSD	14" 1600x900	3.98 lbs	$1,399.99
Apple MacBook Pro 13 with Retina	Mac OS X 10.8	Core i5 2.5GHz (3.1GHz max)	8GB	128 GB SSD	13.3" 2560x1600	3.57 lbs	$1,499.00

DoJ proves SOPA and PIPA not needed, seizes Megaupload

Yesterday I wrote "If SOPA or PIPA passes Megaupload would be kicked off the internet." Today the United States Department of Justice kicked Megaupload off the internet [DoJ][WSJ][TF]. Yes that's right, without SOPA or PIPA the US government was able to shutdown a foreign "rogue" site. How foreign? Datacenters in the US, Canada, and the Netherlands were raided. Charges were brought against 7 people who are citizens of Germany, the Netherlands, Slovakia, Estonia, Turkey, Hong Kong, and New Zealand -- none of them US citizens. Four of them were arrested in New Zealand.

How did this happen without without SOPA or PIPA? The same way all federal cases happen, a grand jury issued an indictment. In 2010, over the Thanksgiving weekend the US Immigration and Customs Enforcement seized 82 domains, and a year later admitted some were a mistake. And Thanksgiving weekend of 2011 they seized 150 domains.

So the DoJ has done a great job in proving that SOPA and PIPA are not needed. Not only can they seizes domains, property, and arrest those involved with foreign "rogue" sites, they can do it all without due process. Killing a business and then having a trial (or not even having a trial in the case of Dajaz1) is like executing a suspect and then holding a trial to convict them. Even if Megaupload is found not guilty, they will likely never recover from having $50 million in assets seized, the company has been killed but not convicted.

This kind of skirt around due process makes sense in some cases. If you suspect someone is a terrorist and has a bomb in their backpack it makes sense to arrest them and blow up their backpack, then hold a trial. Hundreds of lives are at risk if a bomb goes off, destroying a $30 backpack does little harm, it's fair.

The DoJ have been so brainwashed by the RIAA and MPAA that they think pirated entertainment is as dangerous as terrorism. In 2010 Universal Music Group made $5.7672 billion in revenue. Maybe Megaupload cost them $100,000 in revenue, so without Megaupload UMG would have made $5.7673 billion. No one is dying because of Megaupload. Of course the DoJ doesn't say it cost them $100 thousand, they say Megaupload cost the industry over $500 million. They use funny math for that, like when Arista Records requested damages of $150,000 per infringing file. Or that every download of a movie costs the industry the $45 retail price of a Blu-ray disc. Realistically the amount that piracy actually costs the entertainment industry is tiny, and probably less than the amount they spend on the MPAA and RIAA.

Seriously, SOPA and PIPA are not needed. Laws and legal processes already exist to protect intellectual property. What we really need is a law to protect due process, like the Due Process Guarantee Act that Dianne Feinstein introduced. That act protects due process for terrorist suspects. Yet Dianne Feinstein is sponsoring PIPA. Apparently she thinks terrorist deserve more rights than web site owners. Guess who I'm not voting for next Senate election.