DeepSeq Core Labs FAQ section - UMass Chan Medical School

I used the Chromium 10X for my library prep. Do you need anything special to run my libraries?

The Core has run multiple 10X libraries without problems. Please remember that these libraries are SINGLE-indexed, with an inline barcode at the beginning of Read 1 and indexes in the i7 position. Also, each sample has a SET OF FOUR i7 indexes (or sometimes more depending on kit version). We will need to know all the indexes in order to demultiplex them correctly. The Core will have to add at least 5% of base-balancer (phiX) to avoid any problems caused by lack of sequence diversity in the polyA tails.

Barcodes? Indexes? How and why do I use them?

(See the Resources page for more information and downloadable help documents.)
Illumina instruments can read barcodes and indexes. There are 2 common ways this works.

The "Illumina-style" way to do barcoding/indexing is to insert the index sequence into the adapter and perform an additional sequencing step to prime and read that sequence. This will cost more since there is an additional sequencing read. It will also take slightly longer for the Core to demultiplex the data before delivery. If you use the adapter-based indexes with the additional sequence read, you must provide the Core with a list of sample names and index sequences if you want us to sort them. If making your own adapter oligos, we recommend using the index sequences provided in support documents from Illumina. The Core STRONGLY recommends using >=8bp Dual or Unique Dual indexes for very-high-throughput instruments such as the NovaSeq to ensure all the indexes can be sorted cleanly.

For an "inline" barcode, you can add an index or barcode to your sequence when you build. This is generally added to the 3' end of the upstream adapter (next to the insert) so that this is the first 4-6 bases read during the sequencing run. Then you sort the data by these first bases into your groups. If you do this, keep them to 6 or less bases and design them to be as clear as possible. We recommend that you design in such a way that if one base were misread or the first base was missed, you could still tell which barcode it is (you can also look at commercially available or previously published lists and use those). Additionally, please design a sequence that is NOT part of the sequencing primer or part of your PCR primers, for obvious reasons.

In any style of index, please design and mix to provide the least base overlap (to prevent mis-matches) and the most base diversity.

I'm using a 3rd-party (non-Illumina) kit to build my libraries. Are you able to run my samples?

If the kit can build Illumina libraries, we can run them. If your kit comes with a custom sequencing primer, please let the Core know, and provide a tube of the primer (at the stock 100uM concentration when possible). Sending a part number and/or kit manual is always helpful to make sure we will have all the necessary primer and index sequences.
If you need a run type that is not offered on the ticket/website, please contact the Core to discuss whether the run is customizable.

My library requires a custom sequencing primer? Can I use one at the Core?

The Core will use custom sequencing primers if they meet our requirements. Custom primer sequences should be sent to the Core lab staff for approval before the library is submitted for analysis. The primer must have the Tm needed for the instrument it will run on.
Unless the library is only meant for the MiSeq, custom primers for anything other than Read1 cannot have a sequence that will compete with binding of standard Illumina primers. We require a minimum 10ul of 100uM Read1 primer, and 40ul of 100uM primer for other reads.

When you say a "clean library" runs better, what do you mean by "clean" and how do I make mine that way? Why does it matter?

Carryover from many common molecular biology reagents can interfere with cluster generation and decrease the quality of your data. Avoiding cross-contamination is also important. A little bit of garbage/unwanted DNA going into the library means MILLIONS of junk sequences coming out of your analysis!
Some tips for getting a "clean" library are: do NOT use ethanol precipitations, do NOT add glycogen or LPA, prewash spin columns (as per their manual) if you are using them. Do not reuse gel box reagents to gel-purify a library, and keep ladder lanes away from library lanes on a gel.
The Core recommends bead cleanups when building sequencing libraries. Illumina kits now use this as the standard cleanup method as well. If you are not using a kit, here is a protocol to follow for library cleanup using AMPure XP beads. Gel purifications may also be used, but we recommend a bead or column wash afterward to ensure there is no carryover of reagents.

Can I submit several samples and decide later which one(s) to use?

Samples submitted for Deep Sequencing are entered directly into that workflow. If you remove a sample from the Deep Sequencing queue you will be billed for the QC and a processing fee, since the QC cost will no longer be covered by the run cost. If you are unsure of the quality of your samples, you can submit them to the Fragment Analyzer service first.

Where is the Fragment Analyzer data for my sample? Am I billed for that? Will you send me the QC results for my sequencing sample?

The Fragment Analyzer trace is part of the initial QC done by the core lab before your sample is put on one of the next-gen sequencers. If you request it, we will send a copy to you for your information and as part of your data set.
If there is a problem with your sample which might affect it's performance, we will contact you before running and you have the opportunity to adjust the sample (concentrate it, further size-select, etc). A quick response (within 48 hours please) to this is important to keep your sample moving through the queue. If we do not hear back from you, the Core will make a judgment call on moving forward with the analysis.
You are not billed for the Fragment Analyzer analysis, which is part of sample QC. However, if a user submits samples in order to get Fragment Analyzer info before moving forward, or to assist in the selection of libraries to have sequenced vs withdrawn, and then removes sample(s) from the Deep Sequencing queue: we will invoice for the Fragment Analyzer run(s), as these use reagents and resources. You will be billed to cover the QC costs and a processing fee. If you need to see how your library looks when finished, or better yet during stages of construction, we strongly recommend using the MBCL service.
If you want to have samples run on the Fragment Analyzer for any reason, the MBCL offers a very economical service for RNA and DNA/genomic samples (http://www.umassmed.edu/nemo/mbcl/fragment-analyzer-service/).

Why do HiSeq4000 run types cost more than the runs on the HS2000 and MiSeq?

The reagents cost more, the flowcells cost more, and overall they cost more to operate. On the other hand if you build a good library, you'll get a lot more data! The price PER BASE is actually cheaper. As with the other instruments the number of good reads is determined by the length of the library (length is figured into how many clusters can be seeded), how tightly the library is sized, and the content and quality of the library.

What are the prices for the various types of runs/analyses available?

Pricing for UMass investigators can be obtained by sending an email to DeepSequencingCoreLabs@umassmed.edu and asking for a price list. Investigators at other institutions should contact us for pricing as there are different schedules set by the administration for outside users.
WHY AREN'T THE PRICES ON THE WEBSITE? These web pages are viewable outside UMassMed and our agreements with vendors for discounted reagents etc (which we pass on to our users) prevent us from publishing those prices outside our group

What happens to my sample once I submit it?

The first part of the queue includes sample submission, login, and initial QC.
When you submit a sample, it is with the understanding that you are ready to go and have done your validation sequencing and workup. Submitting several samples "just to see how they look" is not good practice. You may send them to the Fragment Analyzer service or any other QC you choose ahead of submission. The initial QC is usually a Fragment Analyzer trace, possibly with a Q-PCR or Qubit assay depending on the platform and type of sample you submitted. This procedure usually takes 1-2 BUSINESS days. Once the sample passes QC to our satisfaction, we move to the next step. If it does not pass, you will receive a note with a copy of the trace data. You can then work with us to further process the sample or you can ask for it to be run as-is but without any assurance of quality or output.
The next part of the queue is the time from QC-pass to the start of sequencing.
The wait depends on several factors, including instrument availability, reagent status (if backordered), but most critical is the number of other samples waiting for the same type of run. If you turned in 1 single read 100 sample you might wait longer since we don't get as many of those. Conversely, if you turn in some PE50 samples, you might wait while all the other PE50's ahead of you are run. Samples are run on a first-in first-on basis. Lots of samples ahead of you means a longer wait time. There are 8 lanes on an Illumina HiSeq flow cell. If you are waiting for the HiSeqs, please note that we need 7 samples to start up a run (lane 8 is a requisite control).
The next part of the queue is the time to perform the cluster generation, chemistry, and image each cycle.
This varies by platform (HiSeq vs MiSeq). If your read is longer it will take more time. If you are doing a paired read, add one more business day for the building of the reverse orientation.
The final stages are the pipeline which does the basecalling, and data transfer.
Pipeline and transfer times vary depending on run length, the activity on HPCC (the High Performance Computing Cluster), and other computer phenomena. Some of these factors (such as HPCC down-time or data transfer using other methods like portable hard drives), are beyond our control.

When will my sample be run? How long is the queue? Why don't I have my data yet? Can I move up in the queue? etc etc

We operate on a first in, first on basis for our investigators. However we cannot always control the SPEED at which the queue moves forward. We need to fill all the lanes on multi-lane flowcells. If instruments are not performing up to specs, they are taken off-line and serviced (otherwise you get bad data). When there are issues related to infrastructure, computing issues, instrument failure, power failure, etc. and we need to reset a run, this also delays the queue. The lab staff cannot change the queue, so please don't ask them.

Where is my sample in the queue? Is there an easy way to find out? How is it doing?

At this time, if you need to know where a sample is in this process, the only way to find out is to please email us at DeepSequencingCoreLabs@umassmed.edu and give us 1 business day to respond. We can't give you any performance info until the pipeline is finished.

I'm in a hurry and have a deadline! What can I do?

IF YOU ARE IN A HURRY, YOU MIGHT CONSIDER SELECTING SEVERAL CONFIGURATIONS ON YOUR TICKET AND CHECKING ON THE TICKET "USE FIRST AVAILABLE"
For example: if your SR50 could be analyzed as a SR100 instead or vice versa. If you note that on the ticket you'll get onto whichever acceptable type of run that goes next. (In this case, you will be invoiced for whichever run type is used.)
IF YOU ARE IN A BIG BIG HURRY ....
Since the reason for the queue and the wait is to fill the flowcells and keep costs down, you could fill the flowcell. If you are willing to purchase the unused lanes and fill them with archived samples of yours which might benefit from another run or even run the new sample in several lanes for additional depth, then let us know. We will work with you to get things going as fast as we can.

How long will it take?

The queue for a short-insert, well made library is usually a couple of weeks (and could be less than one week for MiSeq runs). The queue for a library with long inserts which is poorly made or which has a wide range of insert sizes can be over a month, especially if there needs to be effort spent on getting it into a useable condition.
We QC using the Fragment Analyzer and if there are concerns at this point we contact you immediately. The sooner you can respond and help with any adjustments, the quicker the sample can move to the next step.

What can I do about getting my sample through the queue and my data as fast as possible?

Make the best library you can! Build a good, tightly-sized, clean library and share information with us about any residual linkers, primer sequences, barcodes, UMIs, adapters, polyA stretches, normalization controls, even the results of your test sequencing of the Topo clones can be helpful. Getting through the queue fast is dependent on passing the Core's QC. When libraries don't sequence well, they end up getting reworked and rerun, which translates to more expense for the investigator and longer queue times.
Other than that, we are a first-come, first-serve facility. We cannot prioritize your sample ahead of others.
The guidelines for library construction and validation are there to ensure good performance. The Topo cloning and pre-sequencing is important to determine whether the linkers, adapters, sequence priming site, and attachment sequences are in place. The sizing guidelines are extremely important. If your sample covers a wide size range, it's better to make several size cuts and turn in several sub-libraries.
If a user needs to analyze an inordinate number of samples which would set back everyone else, we will work with them to create a schedule that does not negatively impact the other users. (So if you have 120 samples, email us in advance please. This will also give us as much notice as possible so we can have the reagents in stock and will be ready to get them moving as soon as they arrive).

What happens to my sample after it has been run?

Your library will be archived in our storage for a period of time. Please contact us if you need sample returned to you for further analysis. We are not able to archive your data.

If my run fails, do I have to pay for it? Why am I billed for a run if it wasn't as much data as I need?

You pay for a "run", not a fixed amount of data. Billing is not determined by the output, in other words you don't pay by the megabase. Billing is also not dependent upon performance. We are an at-cost not-for-profit facility. Running your sample uses reagents and supplies so you are billed for the sample/run.
If there is a failure due to reagents or instrument performance we will re-run that sample again at no additional charge. If the failure was due to library construction issues, e.g. you used non-modified home-made primers, or there was some other problem related to the sample, you pay for any re-runs (after the problem is fixed, hopefully). We will do our best to help with troubleshooting.

Is there any way I can evaluate the run quality? Do you have anything I can run on my own computer to look at the data without using the High Performance Cluster (HPCC)?

Please go to our Resources in the navigation menu, and see the Data/Bioinformatics section. We are collecting useful information and always open to new suggestions.

Why do I see lots of "BBBB" in my fastq quality info when the sequences map great and appear to be good reads?

This is a very good question, one which we at the core lab have been asking as well. It appears to be an "undocumented feature"!! There is some chat about it online (try this for a start, thanks David, http://news.open-bio.org/news/2010/04/illumina-q2-trim-fastq/). We see it most often when there are non-random bases at the ends of sequences. Non-random bases (e.g. linkers) raise flags for the analysis software. The base calls maybe (and usually are) good, but they don't look like the rest of the run to the instrument. We welcome feedback about your experiences with this so we can share details with Illumina. Please continue to keep us updated about your data analysis experiences.

In the ChIP protocol from the core lab, what do you mean by "reversing the orientation of the column" when you elute?

This refers to rotating the plate or column 180 degrees so that the area near the spindle is now at the furthest point from the center of the rotor. This does not mean to turn it upside down!

Is it necessary to order the adapters and primers for the Illumina libraries from the vendor or the core lab or can I make my own?

Adapters can be home-made and in fact this is the best way to add an internal barcode. Just be sure the phosphate is on the 5' end of the downstream adapter. Email the Core or use your Illumina account to obtain a copy of the latest adapter sequence list. If using custom adapters, remember that acceptable Tm differs by instrument, so you must consider that during the design stage. They also should not interfere with any of the primers in the standard Illumina mixes. It is recommended that you consult with the Core on custom adapter sequences.

Adapters? Primers? What works with what?

The TruSeq DNA/RNA, old PE (Paired-End) Genomic DNA, Nextera, and old Paired-End Multiplexing adapters work on all Illumina platforms and can be read as a single read or paired read.
TruSeq small RNA adapters will work on all platforms, but can only be used for a single forward read.
The OLD small RNA adapters work only for a forward read on Single Read flowcells on the HiSeq2000 or GA, and we strongly recommend you do not use them to build new libraries.
IF YOU USE THE ILLUMINA ADAPTERS WITH THE INTERNAL INDEXES, AND YOU WANT THOSE READ, YOU MUST ORDER A MULTIPLEX READ when you submit your sample. This is an extra priming and sequencing read and will cost extra.
See the FAQ about indexes vs barcodes if you are interested in doing the barcoding "inline" within the main fragment read.

Do I really need to size my library and keep it +/-25bp from the median size? WHY?

YES, it's a very good thing to do if you want the most and best sequence possible. We load based on the largest fragment in the group, so if the size spread is wide you'll get fewer clusters/sequences. Cluster detection is more accurate when all clusters are the same size (wide range of fragment sizes results in wide range of cluster sizes). Additionally, very large fragments don't stay denatured as long and will often just roll through the flowcell or even interfere with the annealing of the smaller fragments.

I am preparing a grant application. How do I obtain a letter of support for sequencing?

Please contact Maria.Zapp@umassmed.edu for a letter of support.