Difference between revisions of "Talk:2298: Coronavirus Genome"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
m (Use the template)
(+ proposal for Japanese spell checkers)
 
Line 51: Line 51:
 
Is using Notepad to analyse RNA sequences more or less sane than using a spam filter to play chess? - [[User:Angel|Angel]] ([[User talk:Angel|talk]]) 00:43, 27 April 2020 (UTC)
 
Is using Notepad to analyse RNA sequences more or less sane than using a spam filter to play chess? - [[User:Angel|Angel]] ([[User talk:Angel|talk]]) 00:43, 27 April 2020 (UTC)
 
: Is that filter used to prevent emails pretending to be from Czech mates looking to give you a knight to remember in a message full of pawn images? [[Special:Contributions/162.158.158.211|162.158.158.211]] 15:10, 27 April 2020 (UTC)
 
: Is that filter used to prevent emails pretending to be from Czech mates looking to give you a knight to remember in a message full of pawn images? [[Special:Contributions/162.158.158.211|162.158.158.211]] 15:10, 27 April 2020 (UTC)
 +
 +
Just stumbled on this. I wonder if Japanese spell checker tech (like many [https://en.wikipedia.org/wiki/Logogram logographic scripts], words aren't separated by whitespace) would work for strings of nucleotide letters. Normally, you try to match the longest possible strings with algorithms like BLAST, but maybe the spellcheckers get so much optimization that they're more efficient. Or maybe spellcheckers should use BLAST. [[User:Ericprud|Ericprud]] ([[User talk:Ericprud|talk]]) 18:04, 23 November 2022 (UTC)

Latest revision as of 18:04, 23 November 2022


Epigenetics is a pun, right? I think it's a pun but I don't know what and it's maddening. That's right, Jacky720 just signed this (talk | contribs) 23:03, 24 April 2020 (UTC)

...Epigenetics is a real thing—the study of how changes in things other than the genome itself can be passed down between generations. An example is conditioning a mouse to be scared of the smell of oranges/cherries/almonds by having them associate the scent of acetophenone with an electric shock, then testing whether its pups also have the same fear of that smell: they do, but this obviously can't be by the genome itself changing (no component of this has a lot of ionizing radiation[citation needed]). Whatever causes this is the topic of actual epigenetics. --Volleo6144 (talk) 00:12, 25 April 2020 (UTC)
I know that, I added the link to the article. But afaik that has nothing to do with how the genome is formatted in Word, and I think it's a pun. That's right, Jacky720 just signed this (talk | contribs) 00:31, 25 April 2020 (UTC)

since when does notepad have spellcheck? 172.68.226.46 23:05, 24 April 2020 (UTC)

Neither notepad nor wordpad have spellcheck. I suspect he combined two jokes and the spellcheck to word link was not better established.Quinoje (talk) 19:35, 27 April 2020 (UTC)
Word does, so maybe she is using Word instead? Kind of contradictory. 172.69.34.46 23:14, 24 April 2020 (UTC)
I assumed Randall meant Wordpad, which ifrc is an upgrade from notepad but has a really thinned out set of Word's features. Maybe there's a spellcheck in there? (haven't used it in ~10 years) Xseo (talk) 07:47, 27 April 2020 (UTC)

Very disappointed that she's using Notepad and not Notepad++ . I mean,really... Cellocgw (talk) 15:38, 27 April 2020 (UTC)

When Dr. Theall first scanned Finnegans Wake, he had to tell Microsoft the language was Old Icelandic.

The OCR kept trying to spellcheck Finnegans Wake.15:11, 26 April 2020 (UTC)

True Story: In the 1980s, as part of the Work Experience initiative at my school, I was assigned to one of my local council's offices (I'd applied for their computer department, but someone else got that). I don't think the word-processor I used at home (Psion Exchange) had spellcheck, but the one the office used (Lotus? Can't actually recall, but it, like most things, was DOS-based) definitely had, and it was very easy to edit in new words. Inspired by the chemistry lessons I'd recently had, and some 'reports' I was asked to write (keeping the kid busy, more like!) that dealt with chemical degradation of concrete under the action of salt and suchlike, I of course added "NaCl" then absolutely any other chemical formulae I could think of. "H2SO4" was an early one (partial subscript formatting wasn't relevent to the spill-chucker) but I eventually got round to CH4, C2H6, C3H8, etc, and then as many of the derived alcohols, alkenes, alkynes, etc that I could be bothered to type in. Which were a lot. By the end I was 'confident' that nobody would ever type any correct chemical formula into that machine (no network-shared resources!) and have to worry about false-positive typo alerts. Yeah, well, I was still at school and thought I knew everything. 162.158.159.70 23:37, 24 April 2020 (UTC)

Can confirm: virus genomes are looked at in notepad. I worked at one of the national laboratories for a summer, experimenting with ways to check for the length of a gene and strength of genetic expression in various circumstances in E. coli. We used notepad because even old computers can open very large files without difficulty, and all our scripts were in Perl, which can easily output to .rtf or .txt file formats. These files are huge, by the way. If you hold down on the scroll bar so it's zooming to the bottom, you could be waiting 20 minutes to reach the end depending on the number of kilobase pairs in your microbe. And epigenetics is not a pun. It's a real word. 172.68.143.192 00:15, 25 April 2020 (UTC)

even old computers can open very large files without difficulty - Depending on what you mean by "old" and "very large" that may well not be true. In Windows 3.x, Notepad could open files as large as 54Kb, increasing to 64Kb in Windows95, 512Mb in Windows 8 and 1Gb in Windows 10. I don't know which of those would fit a typical virus genome, but I'm guessing it's not all of them. 162.158.187.151 13:43, 27 April 2020 (UTC)
Well, Sars-Cov-2 has around 30 kb, and that's considered big already. Since a base is a letter and thus a byte, a viral genome usually fits in the old notepad. But here is the catch: when people align things you get the number multiplied by whatever many genomes they are looking at. And don't even talk about the Nucleocytoviricota-whatsoever-twats.--162.158.179.12 06:11, 5 May 2020 (UTC)

Concurrent to the work in the medical community, work is underway in various open source software communities to fix bugs and other issues with software (eg genome analysis tools) that is useful to the scientists combatting COVID-19. These include the Debian "biohackathon" (https://lwn.net/Articles/816280/) as well as support from Mozilla (https://lwn.net/Articles/816386/). Parallel to these efforts, the FSF (Free Software Foundation) has focused on the shortage of medical equipment: https://lwn.net/Articles/816392/ 108.162.242.5 00:34, 25 April 2020 (UTC)

I’m suddenly inspired to write a DNA-edit-mode for Emacs (if it doesn’t have it already) which would allow for the virus spell check as described in this comic. 172.69.63.153 04:16, 25 April 2020 (UTC)

the dna-mode for emacs does exist. Google for it. It is not very useful for real work, though. Heikkil (talk) 04:40, 26 April 2020 (UTC)

Derek Lowe has some insights about actual coronavirus mutations here, if you are interested.

Given coronavirus has an RNA genome, shouldn't all the 'T's be replaced by 'U's?

It is standard practice no to use U's in public sequence database. It simplifies things. Heikkil (talk) 04:40, 26 April 2020 (UTC)

The sequence in the transcript does not actually appear on the site mentioned in the explanation. In fact, when I google for 'TACTAGCGTGCCTTTGTAAGCACAAGCTGATTAGTACGAACTTATGTACTCATTCGTTTCGGAAGAGACAGGTACGTTA' I only get this particular site. 141.101.104.221 (talk) 07:00, April 25, 2020 (please sign your comments with ~~~~)

To find this (or any) sequence go to [Blast] and paste the query into the box. You will receive a list of a number of best matches (10, 50 or 100 in standard search), this should look like [[1]]

Interestingly, this is an US-specific strain of the virus (top result currently is "Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/NC_0025/2020").Tier666 (talk) 23:21, 25 April 2020 (UTC)

Well, obviously it's a new variant, yet unknown to other clinical studies. Of RNA that has switched to looking like DNA, so this is a hot discovery! 162.158.159.142 12:05, 25 April 2020 (UTC)
The site shows several views into the public database entry that are easier to understand by humans than the raw sequence. Click the link at 'View: TEXT'. and scroll down. The relevant lines look like this:
     aatccagtaa tggaaccaat ttatgatgaa ccgacgacga ctactagcgt gcctttgtaa     26220
     gcacaagctg attagtacga acttatgtac tcattcgttt cggaagagac aggtacgtta     26280
As you can see, these are not meant to be search for and compared in "a notepad". For the same reason, google does not index DNA sequence database entries. There are specialised tools for that.
The sequnces were published this month, so they are available only in the most recent sequence database updates. Heikkil (talk) 04:40, 26 April 2020 (UTC)


I have had trouble opening .txt files of even a hundred KB in Notepad! Sometimes it even crashes... It's one of the reasons I started using Notepad++. Notepad++ also happens to have a very extensible spellcheck, & language-specific formatting options. Since I often need to use Windows machines, it's one of my most frequently installed apps, after 7Zip. ProphetZarquon (talk) 18:03, 25 April 2020 (UTC)

The Grammar Checker concept only has a limited analytical sophistication, though I don't doubt it'd still be enough to get a Nobel given the complexity of the task of deriving trivially feasible sequences from total codswallop. I also added the "next step" (probably much more than a single step), when I revised things, but that might actually be overstepping the explanation of the comic and removable. 162.158.155.122 20:32, 25 April 2020 (UTC)

Thanks for mentioning this in the discussion area, as I wondered what that "next step" line meant when I read it a little while ago, let alone how it related to the comic. I'll go ahead and trim that last "next step" sentence off the end, as I think it is unnecessary. Ianrbibtitlht (talk) 03:36, 26 April 2020 (UTC)

Is using Notepad to analyse RNA sequences more or less sane than using a spam filter to play chess? - Angel (talk) 00:43, 27 April 2020 (UTC)

Is that filter used to prevent emails pretending to be from Czech mates looking to give you a knight to remember in a message full of pawn images? 162.158.158.211 15:10, 27 April 2020 (UTC)

Just stumbled on this. I wonder if Japanese spell checker tech (like many logographic scripts, words aren't separated by whitespace) would work for strings of nucleotide letters. Normally, you try to match the longest possible strings with algorithms like BLAST, but maybe the spellcheckers get so much optimization that they're more efficient. Or maybe spellcheckers should use BLAST. Ericprud (talk) 18:04, 23 November 2022 (UTC)