#20 - The "Leaky Forms" study
Hi there! My name is Diego Parrilla. I’m a developer that became an entrepreneur and my latest company is Threatjammer. Subscribe now to my weekly digest about tech, threat Intel, privacy, and security!
Suppose you sign up for a new digital service on the internet and mistype your email address or cancel before hitting the “Submit” button. Anyone would assume that your browser does not send the data until you confirm, right? Wrong! Something is happening in the background: Even some well-known sites are sending your data to them or their partners without notice.
The report Leaky Forms: A Study of Email and Password Exfiltration Before Form Submission announced for the USENIX Security’22 next August jumped to the front pages of several general media outlets. It highlights a dirty secret of the digital industry: Several big companies harvest our Personal Identifiable Information (PII) without our consent, especially email addresses and sometimes passwords.
Researchers from KU Leuven, Radboud University, and the University of Lausanne crawled and analyzed the top 100,000 websites, evaluating scenarios in which a user is visiting a site while in the European Union or the United States. The study shows that users’ email addresses are exfiltrated to well-known tracking, marketing, and analytics domains before form submission and without consent on 1.844 websites in the EU and 2.950 websites in the US.
They developed a crawler based on Radar Collector —the crawler framework used by Duck Duck Go— to measure email and password exfiltration on the top 100.000 sites ranked by Tranco. They used a pre-trained machine-learning classifier to detect the email and password fields in the pages. Then they automatically fill the email and password fields and intercept the scripts with access to the input fields.
Of course, there are cases when a website can collect email and passwords. For instance, when the site wants to check if the user exists before form submission. The researchers excluded all requests sent to first-party domains or third-party domains not flagged as trackers. Hence, the study focuses only on sites sending the information to third-party domains labeled as trackers.
The analysis of these trackers shows a ‘hall of shame’ of domains that exfiltrate emails or passwords even when the user has not given any consent to the Consent Management Provider pop-up displayed. These trackers send the information hashed, encoded, or compressed. In the US the
rlcdn.com owned by TowerData is the most prominent tracker that recollects hashed email addresses. In the European Union, the number one is
taboola.com owned by Taboola, the advertising company.
One of the most concerning results of the research is password exfiltration, and the most prominent domains belong to Yandex, the Russian search engine, followed by web analytics company Mixpanel. It’s tough for me to understand why analytics tools need access to our passwords, but you can make an educated guess.
You can read the research paper here.
They have also published an interesting site with more details and information.
They have created a Browser add-on to detect sites that exfiltrate emails or passwords.
The source code of the Leak Detector is on Github.
The music snippet
Last May 18th, it was five years since Chris Cornell of Soundgarden, Temple of the Dog, and Audioslave took his life. 😔