MicroRNAs (miRNAs) are key regulators of gene expression and contribute to a
variety of biological processes. Abnormal miRNA expression has been reported in
various diseases including pathophysiology of breast cancer, where they regulate
protumorigenic processes including vascular invasiveness, estrogen receptor
status, chemotherapy resistance, invasion and metastasis. The miRBase sequence
database, a public repository for newly discovered miRNAs, has grown rapidly
with approximately >10,000 entries to date. Despite this rapid growth, many
miRNAs have not yet been validated, and several others are yet to be identified.
A lack of a full complement of miRNAs has imposed limitations on recognizing
their important roles in cancer, including breast cancer. Using deep sequencing
technology, we have identified 189 candidate novel microRNAs in human breast
cancer cell lines with diverse tumorigenic potential. We further show that
analysis of 500-nucleotide pri-microRNA secondary structure constitutes a
reliable method to predict bona fide miRNAs as judged by experimental
validation. Candidate novel breast cancer miRNAs with stem lengths of greater
than 30 bp resulted in the generation of precursor and mature sequences
in vivo. On the other hand, candidates with stem length
less than 30 bp were less efficient in producing mature miRNA. This approach may
be used to predict which candidate novel miRNA would qualify as bona fide miRNAs
from deep sequencing data with approximately 90% accuracy