Two great : Computer scientist Katie Bouman and Margaret Hamilton.

Katie with her awesome stack of hard drives for image data 😍.

Margaret and her Apollo Guidance Computer source code 📚.

@moiety wow, hard drive capacity sure has increased since last time I looked

I didn't think 5 petabytes would fit on a desk

@ben 1,6 PiB per stack, 200 TiB per disk 😱.

@carbontwelve @moiety I wonder how they managed to record all the data

did they have a bunch of smaller hard drives they were writing to in parallel

or did they just have some absurd amount of RAM available?

or was the process longer than I'm imagining? (although writing 200TiB to a hard drive would still take *ages*)

@ben they had multiple radio telescopes around the world collecting ultra high resolution samples.

The disks were then flown from the telescopes to two central locations where the five petabytes of data was analysed to find the common wave form. That reduced the data set by 1/10,000 they then went through a few more steps until they had a few hundred TiB of data from which the 250KB image was generated.


@carbontwelve @moiety yeah, but 200TiB is still a lot of data to write

I'm not sure how long it would take, but definitely long enough for the Earth's rotation to affect a lot of stuff

@ben @carbontwelve I don’t think it was 200 TiB file they were writing to a disk.

It’s more about fathering all the data from different devices. And I think your thinking about writing to RAM first for certain things is close to reality.

@ben think of it more as say 100MB/s data generation per dish. If they are receiving for six hours a night thats 2.16TB of data generated.

They where probably generating data faster than 100MB/s; I can't seem to find the link atm but I recently read an astrophysics paper on how one group were collecting and storing radio astronomy data in the GB/s.

From watching the press release they used the rotation of the Earth to their benefit as it enabled them to scan their "viewport" over the target and produce a much higher resolution image as a result.

@carbontwelve oh wow I'm an idiot

of *course* they were doing this over multiple nights

for some reason I had it in my head that all the telescopes were recording at once, but that can't possibly be the case because about half of them would have a very large rock between them and the black hole at any given time

Multiple sites and multiple nights help, but how fast the data can be written is still a limiting factor on the image quality. RAID helps by writing to all disks in a stack simultaneously (RAID-0 I believe), but also the first stage in data analysis is an FPGA that splits one stupendous data stream into many that are only enormous and ships them to separate computers, which then write those to disk separately.
@carbontwelve @moiety

@ben @carbontwelve @moiety
When I was using LOFAR in a mode that generated 2 TB/hr, that's essentially how we coped - the input data stream (one beam) was split into 20 channels which were each recorded by a separate server.

@anne these are the kind of details I love to pour over.

I still cant find the paper I read a few months back but it described the construction of a fibre optic network for a new radio telescope array and the challenges of transferring the equivalent bandwidth of the majority of the internet on one site so that it could be processed down to something more manageable.

I ❤️ astronomy.

@moiety @ben
@anne come to think of it, it may have been you or another astronomer here on the fediverse who shared it in the first place 😊

@moiety @ben

It was probably an article about the SKA, the Square Kilometer Array, the next-generation giant low/medium frequency radio telescope, meant to have a square kilometer total area in its dishes.
@moiety @ben

@carbontwelve @ben I’m not sure they are. The largest commercial ones I know are 16 TB.

I just divided the data by the number of stacks and the number of disks.

I don’t know enough about the disks/arrays themselves to comment any further.

@moiety @carbontwelve @ben the storage system is described in paper II, open access. page 8. on each telescope there are 4 signal chains. each chain is recording at 16Gbps over dual 10gigE writing to 32 HDDs in parallel. So 128 HDDs per telescope in all. 6-10 TB helium filled disks. Total data size of 1-2 PB per telescope, so I guess that's just one telescope's data on the desk.
The data is formatted as VDIF packets, so while it may be split into different files on the disks it can be reconstructed into the full stream.

@eqe @moiety @carbontwelve

that's still 64 megabytes being written per second to each hard drive, which is really impressive, but I guess you can get to that level if you're not reading anything from the drives during the entire process

@eqe @moiety @carbontwelve

This is what happens every time I see something cool.

Like when I saw this cutscene where everyone was amazed by the story going on or the graphics and I was just sitting here going "this is clearly a pre-rendered cutscene, and my character appears in it, so they're definitely doing something very sneaky here and I need to know what it is".

@eqe thank you for sharing the link. It made for fascinating reading. @ben @moiety

@ben so turns out this photo is from 2017. Might not be all of those petabytes

@moiety Margaret Hamilton continues to amaze me. You can't go to the Moon without a computer, and she was in charge of that. Her work on Development Before The Fact is fascinating. And she coined the phrase 'software engineering'. A real pioneer.

@moiety Can we add Katherine Johnson and her hand-plotted calculations for the Mercury mission (and many more!) 😍


If we're going down this road (I approve!) I have to speak up for Jocelyn Bell Burnell, discoverer of the first pulsars. In addition to being a brilliant scientist she is also an awesome person who recognizes and is trying to do something about the inequalities in science.

Sign in to participate in the conversation
Queer Garden

More queer, more garden.