• Metacurity
  • Posts
  • Best Infosec-Related Long Reads for the Week of 4/6/24

Best Infosec-Related Long Reads for the Week of 4/6/24

Tech giants break the rules in race for AI data, New approach needed to avert next XZ Utils backdoor, Drone 'hackers' are winning in Ukraine war, DNA-reliant cryptographic functions might someday protect passwords, House data broker bill might be too narrow

Metacurity is pleased to offer our free and premium subscribers this weekly digest of the best long-form (and longish) infosec-related pieces we couldn’t properly fit into our daily news crush. So tell us what you think, and feel free to share your favorite long reads via email at [email protected].

Image created using ByteDance on Replicate.

How Tech Giants Cut Corners to Harvest Data for A.I.

The New York Times’ Cade Metz, Cecilia Kang, Sheera Frenkel, Stuart A. Thompson, and Nico Grant take a deep dive into how tech giants OpenAI, Google, and Meta ignored web service policies, skirted copyright laws and changed their own rules in a desperate race to gain access to data needed to feed their AI models.

The race to lead A.I. has become a desperate hunt for the digital data needed to advance the technology. To obtain that data, tech companies including OpenAI, Google and Meta have cut corners, ignored corporate policies and debated bending the law, according to an examination by The New York Times.

At Meta, which owns Facebook and Instagram, managers, lawyers and engineers last year discussed buying the publishing house Simon & Schuster to procure long works, according to recordings of internal meetings obtained by The Times. They also conferred on gathering copyrighted data from across the internet, even if that meant facing lawsuits. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.

Like OpenAI, Google transcribed YouTube videos to harvest text for its A.I. models, five people with knowledge of the company’s practices said. That potentially violated the copyrights to the videos, which belong to their creators.

Last year, Google also broadened its terms of service. One motivation for the change, according to members of the company’s privacy team and an internal message viewed by The Times, was to allow Google to be able to tap publicly available Google Docs, restaurant reviews on Google Maps and other online material for more of its A.I. products.

The companies’ actions illustrate how online information — news stories, fictional works, message board posts, Wikipedia articles, computer programs, photos, podcasts and movie clips — has increasingly become the lifeblood of the booming A.I. industry. Creating innovative systems depends on having enough data to teach the technologies to instantly produce text, images, sounds and videos that resemble what a human creates.

The volume of data is crucial. Leading chatbot systems have learned from pools of digital text spanning as many as three trillion words, or roughly twice the number of words stored in Oxford University’s Bodleian Library, which has collected manuscripts since 1602. The most prized data, A.I. researchers said, is high-quality information, such as published books and articles, which have been carefully written and edited by professionals.

Backdoor in XZ Utils That Almost Happened

In Lawfare, noted cryptographer Bruce Schneier recaps the narrowly thwarted catastrophe of the XZ Utils backdoor, advocating for more spending on and a better approach to securing the obscure pieces of open-source software upon which the internet depends.

If it hadn’t been discovered, it probably would have eventually ended up on every computer and server on the internet. Though it’s unclear whether the backdoor would have affected Windows and Mac, it would have worked on Linux. Remember in 2020, when Russia planted a backdoor into SolarWinds that affected 14,000 networks? That seemed like a lot, but this would have been orders of magnitude more damaging. And again, the catastrophe was averted only because a volunteer stumbled on it. And it was possible in the first place only because the first unpaid volunteer, someone who turns out to be a national security single point of failure, was personally targeted and exploited by a foreign actor.

This is no way to run critical national infrastructure. And yet, here we are. This was an attack on our software supply chain. This attack subverted software dependencies. The SolarWinds attack targeted the update process. Other attacks target system design, development, and deployment. Such attacks are becoming increasingly common and effective, and also are increasingly the weapon of choice of nation-states.

It’s impossible to count how many of these single points of failure are in our computer systems. And there’s no way to know how many of the unpaid and unappreciated maintainers of critical software libraries are vulnerable to pressure. (Again, don’t blame them. Blame the industry that is happy to exploit their unpaid labor.) Or how many more have accidentally created exploitable vulnerabilities. How many other coercion attempts are ongoing? A dozen? A hundred? It seems impossible that the XZ Utils operation was a unique instance.

Solutions are hard. Banning open source won’t work; it’s precisely because XZ Utils is open source that an engineer discovered the problem in time. Banning software libraries won’t work, either; modern software can’t function without them. For years security engineers have been pushing something called a “software bill of materials”: an ingredients list of sorts so that when one of these packages is compromised, network owners at least know if they’re vulnerable. The industry hates this idea and has been fighting it for years, but perhaps the tide is turning.

The fundamental problem is that tech companies dislike spending extra money even more than programmers dislike doing extra work. If there’s free software out there, they are going to use it—and they’re not going to do much in-house security testing. Easier software development equals lower costs equals more profits. The market economy rewards this sort of insecurity.

We need some sustainable ways to fund open-source projects that become de facto critical infrastructure. Public shaming can help here. The Open Source Security Foundation (OSSF), founded in 2022 after another critical vulnerability in an open-source library—Log4j—was discovered, addresses this problem. The big tech companies pledged $30 million in funding after the critical Log4j supply chain vulnerability, but they never delivered. And they are still happy to make use of all this free labor and free resources, as a recent Microsoft anecdote indicates. The companies benefiting from these freely available libraries need to actually step up, and the government can force them to.

Ukraine Is the First “Hackers’ War”

In IEEE Spectrum, Juan Chulilla, cofounder of Red Team Shield S.L., analyzes how drone hackers are in the lead in the deadly game of “hacksymmetrical” electronic warfare (EW) in Ukraine, with ever-accelerating cycles of cat-and-mouse switch-outs of commercial and military-grade drone equipment to outwit adversaries.

The electronic battlefield has now become a massive game of cat and mouse. Because commercial drones have proven so lethal and disruptive, drone operators have become high-priority targets. As a result, operators have had to reinvent camouflage techniques, while the hackers who drive the evolution of their drones are working on every modification of RF equipment that offers an advantage. Besides the frequency-band modification described above, hackers have developed and refined two-way, two-signal repeatersfor drones. Such systems are attached to another drone that hovers close to the operator and well above the ground, relaying signals to and from the attacking drone. Such repeaters more than double the practical range of drone communications, and thus the EW “cats” in this game have to search a much wider area than before.

Hackers and an emerging cottage industry of war startups are raising the stakes. Their primary goal is to erode the effectiveness of jammers by attacking them autonomously. In this countermeasure, offensive drones are equipped with home-on-jam systems. Over the next several months, increasingly sophisticated versions of these systems will be fielded. These home-on-jam capabilities will autonomously target any jamming emission within range; this range, which is classified, depends on emission power at a rate that is believed to be 0.3 kilometers per watt. In other words, if a jammer has 100 W of signal power, it can be detected up to 30 km away, and then attacked. After these advances allow the drone “mice” to hunt the EW cat, what will happen to the cat?

The challenge is unprecedented and the outcome uncertain. But on both sides of the line you’ll find much the same kind of people doing much the same thing: hacking. Civilian hackers have for years lent their skills to such shady enterprises as narco-trafficking and organized crime. Now hacking is a major, indispensable component of a full-fledged war, and its practitioners have emerged from a gray zone of plausible deniability into the limelight of military prominence. Ukraine is the first true war of the hackers.

The implications for Western militaries are ominous. We have neither masses of drones nor masses of EW tech. What is worse, the world’s best hackers are completely disconnected from the development of defense systems. The Ukrainian experience, where a vibrant war startup scene is emerging, suggests a model for integrating maverick hackers into our defense strategies. As the first hacker war continues to unfold, it serves as a reminder that in the era of electronic and drone warfare, the most critical assets are not just the technologies we deploy but also the scale and the depth of the human ingenuity behind them.

Protecting art and passwords with biochemistry

In Nature Communications, researchers at ETH Zurich describe a new cryptographic one-way function that, rather than relying on arithmetic operations, depends on nucleotides, the chemical building blocks of DNA. If DNA testing costs decline, this function can be used to counterfeit-proof valuable art and even passwords.

Although mathematical one-way functions are widely used, advancements in quantum computing and the lack of proof for cryptographic security of such algorithms have led to the exploration alternative methods.For example, Pappu et al.18 suggested the use of an object with a disordered microstructure for cryptographic key generation. They exploited the randomness of silica spheres suspended in a hardened epoxy to map the orientation of the token in relation to a light source (input) to the resulting laser scattering pattern (output). Such functions work similar to cryptographic hash functions, but rely on a physical source of disorder instead of number theory, and have generally been termed physical unclonable functions (PUF). PUFs are characterized by their ability to translate an input (challenge) to an output (response) through a physical system that is unique and cannot be replicated, with the challenge response pairs (CRPs) being difficult or impossible to predict. PUFs have been proposed for applications in intellectual property protection, public key cryptography and anti-counterfeiting of goods and services.

Genetic information has already been suggested as a medium for physical unclonable functions by using CRISPR-induced nonhomologous end joining repair to generate a unique barcode-indel mapping (CRISPR-PUFs). There, the random process refers to the combination of barcodes and indels in a given cell line. We instead propose to directly use randomly generated DNA sequences, giving rise to massive levels of entropy. As it was recently shown that chemical DNA synthesis can be used to generate random numbers, we envisioned that enormous random DNA pools could be used to implement a type of object-bound cryptography based on chemistry instead of physics.

In this work, we introduce chemical unclonable functions (CUFs) capable of performing calculations by controlled molecular operations. We show that such CUFs are robust, scalable and secure, and that their properties can be compared with PUFs and one-way functions. Furthermore, and in contrast to physical functions, the implemented system allows for switching between a copiable and an uncopiable state. Based on our results, we suggest use cases of decentralized multi-user authentication and non-fungible items, connecting the digital with the physical world.

The Pros and Cons of the House’s Data Broker Bill

Justin Sherman, Lawfare contributing editor and CEO of Global Cyber Strategies, weighs the pros and cons of the Protecting Americans’ Data from Foreign Adversaries Act, a bill passed by the House that limits the data that US data brokers could sell about US persons to individuals located in North Korea, China, Russia, or Iran.

It is important to not make the perfect the enemy of the good—and typically, any marginal improvement on the status quo when it comes to data brokerage would benefit consumer privacy and national security. There are several elements of the bill that are noteworthy in this respect, including the fact that its protections are aimed beyond government personnel, it would not carve out sales of covered data below a certain threshold (for example, data set size), and it has a fairly strong definition of sensitive data that it could put into federal law. At the same time, given the ongoing executive order process on bulk data transfers and national security, it is difficult to make sense of the reasoning behind this new bill and why Congress is taking such a weak approach.

The bill purports to tackle a problem similar to what the executive order is already addressing under existing law—except by creating a new law, and with an ultimately narrower scope, fewer measures to counteract circumventions, a different and non-national security-focused enforcement agency in charge, and a significantly slower timeline for identifying and mitigating national security risks. If Congress is going to spend time writing a new law focused on the data brokerage ecosystem, it should aim bigger and look beyond a national security framework to avoid coming away with a narrower and weaker enforcement regime, with less benefit to national security, than the executive branch can build under current law. The bill’s drafters are focused on important data brokerage problems, and they should be commended for injecting data brokerage into the too narrowly focused policy conversation dominated by debates about TikTok and non-U.S. apps. But the Senate should force a significant amendment and strengthening of the bill before pushing it forward and trying to address the privacy and security risks the House has highlighted.