Facets (new session)
Description
Metadata
Settings
owl:sameAs
Inference Rule:
b3s
b3sifp
dbprdf-label
facets
http://dbpedia.org/resource/inference/rules/dbpedia#
http://dbpedia.org/resource/inference/rules/opencyc#
http://dbpedia.org/resource/inference/rules/umbel#
http://dbpedia.org/resource/inference/rules/yago#
http://dbpedia.org/schema/property_rules#
http://www.ontologyportal.org/inference/rules/SUMO#
http://www.ontologyportal.org/inference/rules/WordNet#
http://www.w3.org/2002/07/owl#
ldp
oplweb
skos-trans
virtrdf-label
None
About:
What if we perceive SARS-CoV-2 genomes as documents? Topic modelling using Latent Dirichlet Allocation to identify mutation signatures and classify SARS-CoV-2 genomes
Goto
Sponge
NotDistinct
Permalink
An Entity of Type :
schema:ScholarlyArticle
, within Data Space :
covidontheweb.inria.fr
associated with source
document(s)
Type:
Academic Article
research paper
schema:ScholarlyArticle
New Facet based on Instances of this Class
Attributes
Values
type
Academic Article
research paper
schema:ScholarlyArticle
isDefinedBy
Covid-on-the-Web dataset
title
What if we perceive SARS-CoV-2 genomes as documents? Topic modelling using Latent Dirichlet Allocation to identify mutation signatures and classify SARS-CoV-2 genomes
Creator
Mande, Sharmila
Nagpal, Sunil
Srivastava, Divyanshu
source
BioRxiv
abstract
Topic modeling is frequently employed for discovering structures (or patterns) in a corpus of documents. Its utility in text-mining and document retrieval tasks in various fields of scientific research is rather well known. An unsupervised machine learning approach, Latent Dirichlet Allocation (LDA) has particularly been utilized for identifying latent (or hidden) topics in document collections and for deciphering the words that define one or more topics using a generative statistical model. Here we describe how SARS-CoV-2 genomic mutation profiles can be structured into a ‘Bag of Words’ to enable identification of signatures (topics) and their probabilistic distribution across various genomes using LDA. Topic models were generated using ~47000 novel corona virus genomes (considered as documents), leading to identification of 16 amino acid mutation signatures and 18 nucleotide mutation signatures (equivalent to topics) in the corpus of chosen genomes through coherence optimization. The document assumption for genomes also helped in identification of contextual nucleotide mutation signatures in the form of conventional N-grams (e.g. bi-grams and tri-grams). We validated the signatures obtained using LDA driven method against the previously reported recurrent mutations and phylogenetic clades for genomes. Additionally, we report the geographical distribution of the identified mutation signatures in SARS-CoV-2 genomes on the global map. Use of the non-phylogenetic albeit classical approaches like topic modeling and other data centric pattern mining algorithms is therefore proposed for supplementing the efforts towards understanding the genomic diversity of the evolving SARS-CoV-2 genomes (and other pathogens/microbes).
has issue date
2020-08-20
(
xsd:dateTime
)
bibo:doi
10.1101/2020.08.20.258772
has license
biorxiv
sha1sum (hex)
24c534c1f480d949f6ceeab0a60c1d7958ee34af
schema:url
https://doi.org/10.1101/2020.08.20.258772
resource representing a document's title
What if we perceive SARS-CoV-2 genomes as documents? Topic modelling using Latent Dirichlet Allocation to identify mutation signatures and classify SARS-CoV-2 genomes
schema:publication
bioRxiv
resource representing a document's body
covid:24c534c1f480d949f6ceeab0a60c1d7958ee34af#body_text
is
schema:about
of
named entity 'conventional'
named entity 'genomes'
named entity 'utility'
named entity 'SARS-CoV-2'
named entity 'classify'
»more»
◂◂ First
◂ Prev
Next ▸
Last ▸▸
Page 1 of 5
Go
Faceted Search & Find service v1.13.91 as of Mar 24 2020
Alternative Linked Data Documents:
Sponger
|
ODE
Content Formats:
RDF
ODATA
Microdata
About
OpenLink Virtuoso
version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software