<?xml version="1.0" encoding="UTF-8"?>
<item xmlns="http://omeka.org/schemas/omeka-xml/v5" itemId="1400" public="1" featured="0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://omeka.org/schemas/omeka-xml/v5 http://omeka.org/schemas/omeka-xml/v5/omeka-xml-5-0.xsd" uri="https://www.socictopen.socict.org/items/show/1400?output=omeka-xml" accessDate="2026-04-18T08:16:55+00:00">
  <fileContainer>
    <file fileId="1400">
      <src>https://www.socictopen.socict.org/files/original/93d9649dbef3498103e50af081b8d170.pdf</src>
      <authentication>e21a3181976332f0263394e47833cf34</authentication>
    </file>
  </fileContainer>
  <collection collectionId="1">
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="1">
                <text>Coronavirus</text>
              </elementText>
            </elementTextContainer>
          </element>
          <element elementId="41">
            <name>Description</name>
            <description>An account of the resource</description>
            <elementTextContainer>
              <elementText elementTextId="2">
                <text>Dominio científico: Coronavirus</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </collection>
  <itemType itemTypeId="1">
    <name>Text</name>
    <description>A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text.</description>
  </itemType>
  <elementSetContainer>
    <elementSet elementSetId="1">
      <name>Dublin Core</name>
      <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
      <elementContainer>
        <element elementId="50">
          <name>Title</name>
          <description>A name given to the resource</description>
          <elementTextContainer>
            <elementText elementTextId="13387">
              <text>A k-mer grammar analysis to uncover maize regulatory architecture</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="39">
          <name>Creator</name>
          <description>An entity primarily responsible for making the resource</description>
          <elementTextContainer>
            <elementText elementTextId="13388">
              <text>Maria Katherine Mejia-Guerra, Edward S. Buckler</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="41">
          <name>Description</name>
          <description>An account of the resource</description>
          <elementTextContainer>
            <elementText elementTextId="13389">
              <text>Abstract Background Only a small percentage of the genome sequence is involved in regulation of gene expression, but to biochemically identify this portion is expensive and laborious. In species like maize, with diverse intergenic regions and lots of repetitive elements, this is an especially challenging problem that limits the use of the data from one line to the other. While regulatory regions are rare, they do have characteristic chromatin contexts and sequence organization (the grammar) with which they can be identified. Results We developed a computational framework to exploit this sequence arrangement. The models learn to classify regulatory regions based on sequence features - k-mers. To do this, we borrowed two approaches from the field of natural language processing: (1) “bag-of-words” which is commonly used for differentially weighting key words in tasks like sentiment analyses, and (2) a vector-space model using word2vec (vector-k-mers), that captures semantic and linguistic relationships between words. We built “bag-of-k-mers” and “vector-k-mers” models that distinguish between regulatory and non-regulatory regions with an average accuracy above 90%. Our “bag-of-k-mers” achieved higher overall accuracy, while the “vector-k-mers” models were more useful in highlighting key groups of sequences within the regulatory regions. Conclusions These models now provide powerful tools to annotate regulatory regions in other maize lines beyond the reference, at low cost and with high accuracy.</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="40">
          <name>Date</name>
          <description>A point or period of time associated with an event in the lifecycle of the resource</description>
          <elementTextContainer>
            <elementText elementTextId="13390">
              <text>2019</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="49">
          <name>Subject</name>
          <description>The topic of the resource</description>
          <elementTextContainer>
            <elementText elementTextId="13391">
              <text>Gene regulatory regions, machine learning models, Crops genomics</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="43">
          <name>Identifier</name>
          <description>An unambiguous reference to the resource within a given context</description>
          <elementTextContainer>
            <elementText elementTextId="13392">
              <text>DOI: 10.1186/s12870-019-1693-2</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="48">
          <name>Source</name>
          <description>A related resource from which the described resource is derived</description>
          <elementTextContainer>
            <elementText elementTextId="13393">
              <text>BMC Plant Biology</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="45">
          <name>Publisher</name>
          <description>An entity responsible for making the resource available</description>
          <elementTextContainer>
            <elementText elementTextId="13394">
              <text>BMC</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="38">
          <name>Coverage</name>
          <description>The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant</description>
          <elementTextContainer>
            <elementText elementTextId="13395">
              <text>Botany</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="44">
          <name>Language</name>
          <description>A language of the resource</description>
          <elementTextContainer>
            <elementText elementTextId="13396">
              <text>EN</text>
            </elementText>
          </elementTextContainer>
        </element>
      </elementContainer>
    </elementSet>
  </elementSetContainer>
</item>
