DNA BARCODING EDELWEISS (Anaphalis longifolia) ASAL SUMATERA UTARA MENGGUNAKAN SEKUEN GEN maturase K

Anaphalis longifolia merupakan anggota dari family Asteraceae yang tersebar di dataran tinggi Eropa, Amerika, hingga Asia. Penelitian tentang tanaman ini masih terbatas pada studi habitat, sedangkan penelitian terkait identifikasi molekuler masih belum dilakukan. Penelitian ini bertujuan untuk menganalisis DNA barcode dari A. longifolia menggunakan sekuen matK gene. Sampel yang diperoleh dari Sumatera Utara kemudian di Isolasi DNA, di amplifikasi menggunakan primer spesifik, lalu disequencing. Hasil sequencing dianalisis menggunakan program Molecular Evolution Genetics Analysis (MEGA) Version X. Hasil penelitian menunjukkan bahwa sekuen matK gen berhasil diamplifikasi pada panjang 800-850 kb. Hasil analisis pohon filogenetik menunjukkan bahwa sekuen matK gene dapat mengelompokkan A. longifolia. Pada sekuen matK gene A. longifolia, AT content lebih tinggi dibandingkan dengan GC conten. Jarak genetik yang diperoleh berkisar 0-0.0014. Hasil analisis alignment sekuen matK gene menunjukkan terdapat 1521 karakter yang dapat diamati, 1403 karakter conserved site, 118 karakter variable site, 9 karakter parsimony informative site, dan 7 karakter single nucleotide polymorphism (SNP) site. Sekuen matK gene dapat digunakan sebagai DNA barcoding untuk mengidentifikasi A. longifolia. Hasil penelitian ini diharapkan dapat memberikan informasi penting dalam konservasi A. longifolia.


Introduction
Anaphalis is a member of the Asteraceae family (Tjitrosoedirdjo, 2002) widespread across mountainous areas in the continents of Europe, America, and even Asia (Chanchani et al., 2011). Anaphalis thrives at an altitude between 800 to 3400 asl (Backer & van den Brink, 1965). Most of this genus can be found in the highlands and mountains (Prakasa et al., 2018). Due to its ability to thrive in a nutrient-poor environment, Anaphalis is considered to have high ecological value (Aliadi et al., 1990).
From 2001 to 2019, North Sumatra has lost 23% (1.33 Mha) of its tree coverage, which is equivalent to the ability to absorb 549 Mt of CO2 emissions. Mandailing Natal Regency is the region with the most reduction in tree coverage (147 Kha) (Global Forest Watch, 2020). Forest degradation and climate change are the main causes of the increasing difficulty of finding Anaphalis. In addition to its habitat in critical environments, A. longifolia is also a plant with low seed viability, making it difficult to conserve. Besides that, A. longifolia also has very slow growth.
Plant identification using barcoding DNA is one tool that can be used in conservation efforts. Barcoding DNA is used to identify, inventory and study specimens to understand species diversity and evaluate the genetic variability of species (Krishna Krishnamurthy & Francis, 2012). With barcoding DNA, researchers can identify species more quickly and thoroughly in order to take the appropriate action for establishing the right scale for conservation. (Francis et al., 2010).
Anaphalis longifolia is a member of the genus Anaphalis (Koster, 1941). IUCN Redlist (2008) classified Anaphalis spp as threatened or endangered plants. Research related to A. longifolia is still limited to its ecological status and distribution (Taufiq et al., 2013). The use of matK gene DNA barcoding to identify this species has not yet been done before. This study aims to analyze the potential of the matK gene as barcoding DNA of A. longifolia plants from North Sumatra. This research is expected to provide important information on how to identify A. longifolia in its conservation effort in North Sumatra.

DNA Extraction
Fresh leaves from A. longifolia were extracted using the Geneaid Plant DNA Isolation Kit following the kit protocol. A total of 100 mg of plant leaf tissue was crushed and then put into a 1.5 ml microcentrifuge tube and added 400 l of lysis buffer GP1 and 5 l of RNase A. After homogenizing and incubating at 60 o C for 10 minutes, 200 l of elution buffer and 100 l of GP2 buffer were added. The mixture was then transferred to a column filter in a 2 ml collection tube and then centrifuged at a rate of 1,000xg for 1 minute. The column filter containing the supernatant was removed and replaced with a new column filter. The solution in the collection tube was moved to a new filter column and 150% buffer GP3 was added to the solution volume. GD Column and the collection tube were then centrifuged at a speed of 16,000 x g for 2 minutes. DNA was washed 2 times using W1 buffer and wash buffer. The DNA in the GD column was then eluted using a 100 l elution buffer which had been heated at 60 o C. The collection tube was then replaced with a microcentrifuge tube. After being centrifuged again at a speed of 16,000 x g for 30 seconds, the DNA that had entered the 1.5 ml microsentrifuge tube was then stored at -20 o C.

DNA Amplification
The matK sequence was amplified using matK-F 5'-ACC CAG TCC ATC TGG AAA TCT TGG TTC-3 'and matK-R 5'-CGT ACA GTA CTT TTG TGT TTA CGA G-3' primers (Ki-Joong Kim, School of Life Sciences and Biotechnology, Korea University, Korea, unpublished). Amplification was carried out using the MyTaq HS Red Mix (Bioline) kit with a total reaction of 25 l (2.5 l of DNA template; 2.5 l matK-F primer; 2.5 l of matK-R primer; 5 l of distilled water; 12.5 l of PCR Mix). Amplification of the matK sequence was carried out at the predenaturation stage at 97 o C for 5 minutes, denaturation at 94 o C for 1.5 minutes, annealing at 52 o C for 1 minute and extension at 72 o C for 1 minute. PCR results were visualized using agarose gel. PCR products which showed clear DNA bands were to be sent to FirstBase DNA Sequencing Service in Singapore for sequencing.

Data Analysis
The sequencing results in the form of a chromatogram were edited using Bioedit 7.0.1 to obtain a consensus sequence based on the conservative sequences generated from the primary sequencing results of matK-F and matK-R. The consensus sequence that has been obtained was then aligned using Basic Local Alignment Search Tool (BAST) from the National Center for Biotechnology Information. Data with high similarity to the sample were included in the phylogenetic tree analysis. Phylogenetic trees were constructed using the Molecular Evolutionary Genetics Analysis (MEGA) X program (Kumar et al., 2018). Analyzes were performed to calculate the percentage of similarity, GC content, and genetic distance.

Result and Discussion
We succeeded in amplifying the matK gene sequence from the total genome of A. Longifolia, which would then be analyzed as DNA barcode (Figure 1). The amplification results ranged from 800-850 kb.

Figure 1.
The results of visualization of A. longifolia matK gene PCR product using 1% agarose gel with 1 kb marker PCR products that have shown positive visualization results using agarose gel were then sequenced. After the sequencing results were analyzed using BLAST on NCBI, it was found that data with high similarity to the sample were primarily from the Asteraceae family. Species with a high level of similarity according to BLAST analysis were Anaphalis margitacea (99.87%), Anaphalioides mariae (99.87%), Anaphalis hancockii (99.75%), Helichrysum felinum (99.62%) Anahalis aureopunctata (99.40%), and Anaphalis sinica (99.15%). Phylogenetic studies show that Anaphalis is very close to Helichrysum and Pseudognaphalium (Smissen et al., 2011;Ward et al., 2009). The results of the BLAST analysis, which showed species variation, indicated that the matK marker is not every effective when used as DNA barcoding on A. longifolia. Some researchers suggested a combination of the matK and rbcl markers in determining DNA barcode (Saarela et al., 2013;Wattoo et al., 2016;Hollingsworth et al., 2009;Techen et al., 2014).

Figure 2.
Phylogenetic tree based on the matK gene sequence of Anaphalis longifolia with the Helianthus annuus outgroup reconstructed using the Neighbor Joining method, with the evolutionary distance calculated using the Kimura 2-parameters method (Kimura, 1980). Percentage of species-replicating trees under bootstrap test (1000 replicates) ) (Felsenstein, 1985).
The phylogenetic tree in Figure 2 shows that the matK gene sequence in A. longifolia can distinguish this species from other Anaphalis genus and from Helianthus annuus as an outgroup of the Asteraceae family. This shows that the matK gene has the potential to be used as DNA barcoding for A. longifolia, but it is less effective when used to distinguish between species in the anaphalis genus.
Analysis using clustal W in the MEGA X application shows that there are 1521 characteristics that can be observed. From this data, there are 1403 conserved sites characteristics, 118 site variables characteristics, and 9 parsimony informative sites characteristics. Phylogenetic analysis using the matK gene sequence has more parsimony informative sites than genes in other chloroplasts (Müller et al., 2006;Barthet & Hilu, 2007). The results of the matK sequence analysis on A. longifolia show that the AT content was higher than the GC content in the Asteraceae family (Table 1). Variation in GC content is a key genome feature due to being closely related to the fundamental elements of genome organization in an organism (Eyre-Walker & Hurst, 2001;Mukhopadhyay et al., 2007). Genomes rich in GC show higher gene density, higher mutation rates conservation level, and higher rates of recombination level compared to regions lacking in GC (Niu et al., 2017). GC content from 65 accessions of Edelweiss (Leontopodium) from the Himalayan/Tibet centre using nuclear ribosomal (ITS and ETS) and plastid (matK and trnL_F) sequences ranged from 43-52% (Blöch et al., 2010). Anapalis selengensis genome has 37.46% GC content and 62.54% AT content (Meng et al., 2019). The genetic distance analysis of Anaphalis longifolia with species from the Anaphalis genus and the Asteraceae family showed that the distances between species in the genus Anaphalis ranged between 0-0.014 (Table 2). The highest variation was found in Anaphalis sinica and the lowest was in Anaphalis longifolia. The genetic distance between the Anaphalis genus and Helianthus annuus as an outgroup ranged from 0.067 to 0.074. The genetic distance in the Leontopodium ITS region ranged from 0.2% to 6.8% (Blöch et al., 2010). A study by Ade et al. (2019) who analyzed the genetic distance of Anaphalis spp (A. javanica, A. longifolia, and A. viscida) based on molecular characteristics (ITS, ETS, and EST-SSR markers) showed that the genetic distance was between 0.004 to 0.040, indicating small genetic distance between species in the Anaphalis genus.
In this study, the results of alignment of the matK gene sequence from 9 data on Anaphalis genus indicated that there were 7 single nucleotide polymorphism (SNP) sites (Table 3), namely sites 492, 504, 505, 506, 1061, 1068, and 1176. In the A. longifolia species, no SNP site was detected. SNP for defined genetic location were determined in at least 1% of the population (Kim & Misra, 2007). SNP is one of the stable genetic polymorphisms in a genome and can be used to analyze differences between closely related species (Germano & Klein, 1999;Yamamoto et al., 2010). matK and rbcl showed high sequence quality, but only provided a few SNP sites (Huang et al., 2014) and therefore highly suitable for use in identifying species (Hollingsworth et al., 2009). Table 3. Single nucleotide polymorphism in the Anaphalis genus Sample Nucleotide Base Site 1 1 1 4 5 5 5 0 0 1 9 0 0 0 6 6 7 2 4 5 6 1 8 6 In this study, the matK gene sequence in A. longifolia has the potentials to be used as DNA barcode. Lahaye et al. (2008) proposed that matK could potentially become DNA barcode in plants. The matK gene sequence is one of the fastest growing sequences of the plastid genome (Hilu & Liang, 1997) and possibly the closest plant analogue to the COI gene sequence in animal DNA barcodes (Hollingsworth et al., 2011). The matK gene sequence can be very difficult to amplify by PCR using existing primary sets, especially in non-angiosperm plants (Hollingsworth et al., 2011).
DNA barcoding, which is widely applied in taxonomic research today, is invaluable for understanding species boundaries, community ecology, evolution and biodiversity conservation (Kress et al., 2015). Currently, conservationists have adopted DNA barcodes as a tool in the field of conservation (Chakraborty et al., 2014;Joly et al., 2014). DNA Barcoding enables the identification species boundaries to be used as clues in determining target conservation habitats (Faith, 1992). When the DNA barcode for each species is complete, comparative measure of phylogenetic diversity will become the standard metric for assessment in determining conservation strategies (Kress et al., 2015). In addition, the development of DNA barcode research is also used for the identification and detection of illegally-traded endangered species (Lahaye et al., 2008b). There will be increased use of DNA barcode in the future, mainly because the available technology will become simpler and cheaper (Kress et al., 2015).

Conclusion
The results showed that the matK gene sequence can be amplified at a length of 800-850 bp. In the A. longifolia matK gene sequence, the AT content was higher than the GC content. The resulting genetic distance ranged between 0-0.0014. The alignment results of the matK gene sequence showed that there are 1521 observable characteristics, including 1403 conserved site characteristics, 118 site variable characteristics, 9 parsimony informative site characteristics, and 7 single nucleotide polymorphism (SNP) site characteristics. This suggests that the matK gene sequence has the potential for development as DNA barcoding to identify A. longifolia.